[GHC] #7815: STM fails to validate read.
GHC
cvs-ghc at haskell.org
Fri Apr 5 19:14:10 CEST 2013
#7815: STM fails to validate read.
----------------------------------------+-----------------------------------
Reporter: fryguybob | Owner:
Type: bug | Status: new
Priority: normal | Component: Runtime System
Version: 7.7 | Keywords: STM
Os: Unknown/Multiple | Architecture: x86
Failure: Incorrect result at runtime | Blockedby:
Blocking: | Related:
----------------------------------------+-----------------------------------
This issue was brought up by napping in {{{#haskell}}} with this paste:
[http://hpaste.org/85134] (the first paste in particular)
The code is:
{{{
import Control.Concurrent.STM
import Control.Concurrent
import Control.Monad
main = do
dog <- newTVarIO False
cat <- newTVarIO False
let unset = do
d <- readTVar dog
c <- readTVar cat
if (d || c) then retry else return ()
setDog = unset >> writeTVar dog True
setCat = unset >> writeTVar cat True
oops = do
d <- readTVar dog
c <- readTVar cat
guard (d && c)
reset = do
writeTVar dog False
writeTVar cat False
reset' = do
d <- readTVar dog
c <- readTVar cat
guard (d || c)
reset
forkIO (atomically oops >> putStrLn "Oh Noes!")
forever (do
forkIO (atomically setDog)
forkIO (atomically setCat)
atomically reset'
atomically reset')
}}}
When run it produces:
{{{
$ ghc --make test.hs -threaded -rtsopts
$ ./test +RTS -N2
Oh Noes!
test: thread blocked indefinitely in an STM transaction
}}}
The second message is just a consequence of entering an unexpected state.
The first message indicates that both the transactions {{{cat}}} and
{{{dog}}} committed at the same time.
It does this for HEAD and 7.6.
I've sketched out an interleaving that leads to this. TRec entries
are in the first and third column and TVar's are in the second column.
Each entry has a TVar name and the expected value followed
by the new value and then a number of updates if it has been read. TVars
list their value and their number of updates.
{{{
A TRec TVar B TRec
-- Transactions start
cat F F cat F 0 cat F F -- Initial reads.
dog F F dog F 0 dog F F
cat F T dog F T -- Local writes in TRec's
-- Validation:
cat F F 0 -- B reads num_updates from cat (with
^ -- consistency check with value)
cat F T cat A 0 | -- A acquires lock for cat (atomic
cas)
dog F F 0 ^ | -- A reads num_updates from dog (with
^ | | -- consistency check with value)
| dog B 0 dog F T | -- B acquires lock for dog (atomic
cas)
| | ^ |
| | | |
Success 0 | 0 | -- read check for A
Success 0 0 -- read check for B
cat A 1 -- Increment version
cat T 1 -- Unlock with new value
dog B 1 -- Increment version
dog B T -- Unlock with new value
}}}
What is clear here is that the version number is not enough to check
in {{{check_read_only}}} because there is a gap between locking and
incrementing the version. We need to know atomically that the TVar is not
locked and it's version number is the same.
I need to read through the right parts of Keir Fraser's thesis carefully,
but it seems like the read phase here is only helpful in preventing a
commit that writes back the exact value we have already seen while we are
in the middle of committing.
The code for {{{check_read_only}}} is here:
{{{
#!c
static StgBool check_read_only(StgTRecHeader *trec STG_UNUSED) {
StgBool result = TRUE;
ASSERT (config_use_read_phase);
IF_STM_FG_LOCKS({
FOR_EACH_ENTRY(trec, e, {
StgTVar *s;
s = e -> tvar;
if (entry_is_read_only(e)) {
TRACE("%p : check_read_only for TVar %p, saw %ld", trec, s, e ->
num_updates);
if (s -> num_updates != e -> num_updates) {
// ||s -> current_value != e -> expected_value) {
TRACE("%p : mismatch", trec);
result = FALSE;
BREAK_FOR_EACH;
}
}
});
});
return result;
}
}}}
If I restore the commented out line (which appears commented out in the
first commit of this code that I can find) I can't reproduce the issue,
but I think there is still a problem due to the ordering of
those checks: we could observe the version as the same, while it
is locked, have the TVar unlock, then observe the value the same.
Switching the order we can only observe the TVar unlocked if the
update has been incremented (as long as we are on an architecture
that ensures this such as x86).
Does this seem right? One issue is that given the interleaving that above
with this added check ''both'' transactions could fail to commit. I think
the algorithms from Fraser avoid this, but I think it always involves
invalidating another transaction (i.e. killing off any other transactions
observed to be in the read check phase with an overlapping TVar in a way
that results in only one winner (see page 21 section 4.4 in ''Concurrent
Programming Without Locks'')).
As a side note, the issue can be avoided by ensuring that the reads become
writes and avoiding the read only check. But you can only
become a write if the values do not have matching pointers, switching to
Ints instead of Bools gets you there.
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/7815>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list