About Haskell Thread Model

Mon Oct 13 13:02:52 EDT 2003

William Lee Irwin III <wli at holomorphy.com> asks about true SMP
haskell.
> On Mon, Oct 13, 2003 at 11:13:56AM +0200, Wolfgang Thaller wrote:
> > The reason why we currently do not take advantage of SMP is that the 
> > Haskell Heap is a shared data structure which is modified whenever a 
> > thunk (an unevaluated expression) is evaluated. Using synchronisation 
> > primitives like pthread_mutex_lock for every evaluation of a thunk 
> > would be deadly for performance.
> > There is some old (1999) code in the GHC RTS that attempts this (using 
> > intel cmpxchg instructions for synchronisation), but it's currently 
> > "bitrotted" and I don't know how successful it was.
> 
> cmpxchg and then taking a blocking lock sounds like the 2-tier locking
> supported with Linux' new futex system calls. I wonder how they chose
> to block in the older GHC RTS.

This invariably proves tricky, even more so now that GHC effectively
uses a many-to-one threading model (and where it may therefore be
wrong to simply block the OS-level thread).  In pH, which isn't quite
haskell but has the same syntax and runs on multiple processors with a
shared heap, we kludged: we simply called "yield" or "sleep" and kept
spinning.  We were actually using work stealing (with lock-free work
queues as I recall) along with wait queues on empty locations, so we
didn't sleep very often.  Nonetheless, this did occasionally turn out
to cause real performance problems.

In reality you probably want to use some sort of thin lock / fat lock
approach a la Java.  Is the location thinly locked?  If so, CAS in a
fat lock instead, then wait on that.  Now update requires a CAS as
well, to see whether the thin lock was fattened.  If so, the fat lock
must be unlocked and deallocated.

Naturally, we'd like to keep track of unshared objects so we can avoid
any kind of complicated update protocol in the common case.  This
requires the implementor to establish a careful and clear contract
between the compiler, the GC, and the thread scheduler.

Finally, the complexity of thunk update pales in comparison to the
complexity of multiprocessor GC.  In pH we punted on this
issue---debugging a GC with per-thread allocation pools was hard
enough.  It killed us; thread performance didn't scale because GC
didn't scale.  Eager Haskell was designed with multithreaded
shared-memory execution in mind, but I simply couldn't find the year
or more it would take to build and debug a true multiprocessor GC---so
it remains a uniprocessor implementation.

-Jan-Willem Maessen
jmaessen at alum.mit.edu