[Haskell-cafe] Dynamic thread management?

Sat Aug 11 21:12:19 EDT 2007

On Sat, 11 Aug 2007, Sebastian Sylvan wrote:

> How is this any better than using "par" in Haskell?

Mainly how the threads are actually scheduled.  Mind you, I'm an 
*incredible* Haskell newbie, so take all of my comments with a 5-pound 
salt block, but as I understand how the current implementation of parallel 
Haskell works, all threads spawned go into a communal heap.  When a thread 
blocks, it's put back into the communal queue, and a new thread is 
selected to run by the worker thread.

In Cilk, the task structure is more tree-like.  When thread A spawns 
thread B, the worker thread stops executing thread A and starts executing 
thread B.  When thread B exits, the worker thread then resumes thread A. 
So in any given point in time, the worker thread has a stack of processes 
waiting for subthreads to complete so they can be resumed- not unlike 
function calls in other languages, or nested lazy blocks in Haskell.

When a worker thread runs out of work to do, when it has no more threads 
to execute, it picks another worker thread at random, and asks the other 
worker thread for work to do.  The other worker thread (assuming it has 
work to do) then picks the microthread at the bottom of the resume stack, 
that is farthest away from being resumed, and passes that microthread's 
state to the original worker thread.

>From the user program's perspective, this is no different from the current 
implementation.  Threads get spawned, get executed in some unspecified 
order, and then complete.

What's interesting (to me, at least) are the optimization opportunities 
this model provides that the shared-queue model doesn't.  Consider the 
following structural model: we have a two-tiered heap.  Each worker thread 
has it's own local heap, which only microthreads it is managing can see. 
Plus there is a global heap with objects that all microthreads in all 
worker threads can see.  Objects are originally allocated in the local 
heap.  When a microthread to start being executed on another worker 
thread, all objects it can see (reach, in the conservative GC sense) are 
promoted to the global heap.

The advantage of all of this is that now we can easily distinguish between 
objects that can be seen from other real threads, and those that can't. 
All of the mutable data structures- tvars, lazy thunks, everything- can be 
much more cheaply implemented if you know that no other thread can access 
them.

Take the example of a parallel map, where I'm using a tvar in my map 
function.  The likelyhood is that all of the (short-lived) microthreads 
I'm spawning will execute on the same worker thread- and that thus the 
tvar will never leave the local heap, and thus can be optimized down to 
something close to a single-threaded mvar.  I have, via optimization, 
turned a parallel map and a synchronized tvar back into something 
approaching a serial map and an mvar.

Brian