RFC: termination detection for STM

Tue Feb 13 15:21:31 EST 2007

hello!

last week i have implemented termination detection for the ghc runtime.
my motivation was a parallel interpreter for a concurrent constraint
language which i have implemented using STM.  this interpreter has reached
its successful final state iff none the stm transactions it has spawned
make progress anymore, i.e. all threads are in state BlockedOnSTM.  well,
it turned out that this state is rather difficult to detect.  it could be
possible to use unsafePerformIO from within an atomically block with an
idempotent io action such that there is some thread which counts the
number of threads which are blocked on stm, but i could not come up with
something that is free of race conditions.  mainly this is because with
stm, it is impossible to know (in haskell code) which threads are woken up
by a committed stm transaction.

well, there is something that would work: a wave-based distributed
termination algorithm.  but that would be very inefficient: every thread
needs to be woken up, and there is still the issue of finding the right
time to send out a detection wave: too late, and the user gets bored; too
early, and there is even more overhead.

of course, the ghc runtime knows which threads are woken up, so i thought
it should be possible to implement the termination detection there.
and also, this should be more reliable than the ugly hack i had before (a
master thread with a timeout that is reset on every stm commit, by
throwing an exception).

so how can the ghc runtime detect termination?
it is quite simple: we need to add a counter somewhere that is incremented
every time a thread that is BlockedOnSTM is woken up, and decremented
every time a thread goes into the BlockedOnSTM state (via retry).
but just having a single counter has a drawback: it might become a
scalability bottleneck.
so i have extended the StgTSO struct with two fields: a counter, and a
parent pointer.
the parent pointer points to the TSO that spawned the thread.
the counter field of a TSO counts the threads which are children of 
this thread (non-transitively!) and have not terminated yet.
invariant: the counter is always >= 0, and == 0 iff the subtree rooted at
this thread has terminated.

conceptually, we have to modify the counter at the following events:
- fork: the forking thread's counter is incremented
        the forked thread's counter is initialized to 1
- exit: the exiting thread's counter is decremented
- retry: the retrying thread's counter is decremented
- wakeup (stm): the counter of the woken thread is incremented
increment means: add 1; if new value is 1: recursively increment parent
decrement means: sub 1; if new value is 0: recursively decrement parent
                        if there is no parent, signal termination

of course, it has to be guaranteed that increments always arrive at the
root before the corresponding decrements, otherwise termination may be
detected prematurely.
note that termination can only be signalled for a thread which has already
exited, or which has called GHC.Conc.awaitTermination (described below).

there are two added primitive operations:
- counterDec# decrements the calling thread's counter
  and also sets the parent pointer to NULL (so the calling thread becomes
  the root of a thread tree for which termination will be detected)
- counterInc# just increments the calling thread's counter
  (cancels the effect of counterDec)

these primitives are meant to be called from a single place:
awaitTermination, which is in GHC.Conc.
it calls counterDec#, then waits for the exception with a delay.
afterwards, it just calls counterInc#.

the termination is signalled by throwing a Deadlock exception to the root
of the thread tree.  actually, i believe the termination condition is a
livelock, but there is no constructor for that.

note that there may be several independent subcomputations within a single
haskell process for which termination can be detected with this approach.

the main drawbacks of this code:
- the locking overhead (counter updates)
- because of the parent pointers, non-leaf threads will never be garbage
  collected if any child thread is still alive.
this could be alleviated by only enabling the termination detection if the
program specifically requests it, e.g. some global flag variable which is
set by another primitive operation.

i would welcome a review of the code; this is the first time i have hacked
on ghc (and also the first non-trivial C code i have written in years), so
there may be issues and interactions with other parts of the runtime that
i have not anticipated. so far, i have tested it only with -threaded
-debug, but it seems to work with -N32 (on a single processor, that is all
i have).
do you think that others might be interested in this functionality?
could it be included in ghc?

also, something that i really do not understand: in the counterDec#
primitive, the stack_size field of the StgTSO struct is corrupted for
no apparent reason, so i save it onto the stack and then restore it.
none of the other primitive ops seem to do something similar.
what am i doing wrong?

patch is against ghc 6.6 (actually the debian package 6.6-3, but i hope
those weird arm floating point endianness patches don't matter much :) ).

        michael stahl

-- 
"I don't feel we did wrong in taking this great country away from them.
 There were great numbers of people who needed new land, and the Indians
 were selfishly trying to keep it for themselves." -- John Wayne
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ghc6.6-termination-detection.patch
Type: text/x-diff
Size: 14594 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20070213/5c8bda90/ghc6.6-termination-detection-0001.bin