[Haskell-cafe] Dynamic thread management?
bhurt at spnz.org
Sat Aug 11 21:12:19 EDT 2007
On Sat, 11 Aug 2007, Sebastian Sylvan wrote:
> How is this any better than using "par" in Haskell?
Mainly how the threads are actually scheduled. Mind you, I'm an
*incredible* Haskell newbie, so take all of my comments with a 5-pound
salt block, but as I understand how the current implementation of parallel
Haskell works, all threads spawned go into a communal heap. When a thread
blocks, it's put back into the communal queue, and a new thread is
selected to run by the worker thread.
In Cilk, the task structure is more tree-like. When thread A spawns
thread B, the worker thread stops executing thread A and starts executing
thread B. When thread B exits, the worker thread then resumes thread A.
So in any given point in time, the worker thread has a stack of processes
waiting for subthreads to complete so they can be resumed- not unlike
function calls in other languages, or nested lazy blocks in Haskell.
When a worker thread runs out of work to do, when it has no more threads
to execute, it picks another worker thread at random, and asks the other
worker thread for work to do. The other worker thread (assuming it has
work to do) then picks the microthread at the bottom of the resume stack,
that is farthest away from being resumed, and passes that microthread's
state to the original worker thread.
>From the user program's perspective, this is no different from the current
implementation. Threads get spawned, get executed in some unspecified
order, and then complete.
What's interesting (to me, at least) are the optimization opportunities
this model provides that the shared-queue model doesn't. Consider the
following structural model: we have a two-tiered heap. Each worker thread
has it's own local heap, which only microthreads it is managing can see.
Plus there is a global heap with objects that all microthreads in all
worker threads can see. Objects are originally allocated in the local
heap. When a microthread to start being executed on another worker
thread, all objects it can see (reach, in the conservative GC sense) are
promoted to the global heap.
The advantage of all of this is that now we can easily distinguish between
objects that can be seen from other real threads, and those that can't.
All of the mutable data structures- tvars, lazy thunks, everything- can be
much more cheaply implemented if you know that no other thread can access
Take the example of a parallel map, where I'm using a tvar in my map
function. The likelyhood is that all of the (short-lived) microthreads
I'm spawning will execute on the same worker thread- and that thus the
tvar will never leave the local heap, and thus can be optimized down to
something close to a single-threaded mvar. I have, via optimization,
turned a parallel map and a synchronized tvar back into something
approaching a serial map and an mvar.
More information about the Haskell-Cafe