STM and fairness

Fri Feb 29 08:52:46 EST 2008

Hi,

I'd like to know a bit about the STM implementation in GHC,
specifically about how it tries to achieve fairness. I've been reading
"Composable Memory Transactions" but it does not contain that much
details on this specific matter. What I want to know boils down to
this: what order are processes run which have been woken up from a
call to retry? When programming with condition variables the standard
behaviour is that the process which has waited the longest is the
first one to get to run. But that doesn't seem to be the behaviour
here. Consider the following program:
\begin{code}
module STMFair where

import Control.Concurrent
import Control.Concurrent.STM

test n = do v <- newTVarIO 0
            mapM_ (\n -> forkIO (process n v) >>
                         threadDelay delay) [1..n]
            atomically (writeTVar v 1)
            threadDelay delay

delay = 500000

process id var = do putStrLn ("Process " ++ show id ++ " started")
                    atomically $ do
                      v <- readTVar var
                      if v == 0
                        then retry
                        else return ()
                    putStrLn ("Process " ++ show id ++ " finished")
\end{code}

When I run 'test 2' I expect it to print:
Process 1 started
Process 2 started
Process 1 finished
Process 2 finished

This would correspond to the oldest process being executed first. But
that is not what happens instead I get this (ghci 6.8.2, Ubuntu
Linux):
Process 1 started
Process 2 started
Process 2 finished
Process 1 finished

This is certainly not the behaviour I would want. I discovered this
behaviour when implementing the dining philosophers using STM and
there one of the philosophers gets starved. Except, that he's not
quite starved. When I run the simulation long enough he will
eventually be able to eat but then for a long time there will be some
other philosopher that is starved. I find this behaviour very
mysterious and it would be nice to have some light shed on it.

Apart from this mysterious behaviour it seems quite easy to improve
the fairness of the implementation. From my examples above it seems
that the wait queues for a transactional variable do contain the
processes in the order they call retry (try running 'test n' for some
large n). It just seems that they are given to the scheduler in the
wrong order, so all that needs to be done is to reverse the list. Am I
right?

Thanks for reading,

Josef