[Haskell-cafe] Dynamic thread management?
andrewcoppin at btinternet.com
Fri Aug 10 13:49:58 EDT 2007
Hugh Perkins wrote:
> - parallelism must be quite coarse to offset overheads
> (which I think is the problem with expecting things like map
> and fold
> to parallelised automagically; they're just too small grained for
> it to
> be worthwhile)
> Someone else said that. I dont understand what you mean.
If you do 1+2+3+4, it's not very efficient to spawn a new thread, have
that compute 1+2, compute 3+4 in the current thread, synchronise, and do
the final addition. You waste more time starting and stopping threads
than doing useful work. If you're going to spawn a new thread, it needs
to do "lots" of work for it to be worth it. And this is kinda what makes
this stuff so tricky - between lazyness and the Halting Problem, it's
hard to work out what is "worth" sparking a thread for...
> Imagine I have a map like this:
> map f [1..100000]
> ... and I have 64 cores..
> So I divide the 100000 elements in my list by 64, and give each chunk
> of 1000-ish elements to each thread. Easy I think?
...you realise that that's a linked-list you have there, right? ;-)
Now, if you had an *array*, and you know that "f" does a constant amount
of work for all elements, and that all array elements will eventually be
> Note that NDP is doing this too, *but NDP is trying to do this for
> assymetric elements at compile-time, which is generally impossible*.
> I propose doing it at runtime, inside the VM, which is trivial and
Parallel and trivial in the same sentence? Hmm, I wonder... how did this
problem not get solved 20 years ago? Oh, wait, maybe because it's harder
than it looks. ;-)
It certainly looks *easier* to partition the work at run-time than
compile-time. But trivial? I doubt it. (For a start, and autosensing
magic you add mustn't slow the program down more than the parallelism
it's supposed to be generating!)
> It's trivial to improve on NDP at runtime. For example:
> - we have a map of 64 elements, so we give one element to each thread
> - all of the pieces but one finishes really quickly (say, within 1
> - so, we start to look at what is happening within the remaining piece
> -> turns out it contains another map, so we split that map amongst our
> idle threads
> -> and so on
> Easy, flexible.
Seems to require running some sort of "monitor" process which can
somehow "see" what operation(s) a thread is trying to do so it can split
up the work and share it around. That's going to add overhead. Now
that's OK so long as the gains outweight the overhead...
> - lazy evaluation makes it harder
> Not at runtime ;-)
The problem with things being lazy is that you don't know whether some
data is "needed" until the moment you actually come to need it. OTOH,
ideally you'd want to know way in advance so that you can compute it in
parallel and it will already be "ready" when you come to use it. But
with lazyness, you can't [easily] do that. Doing the resting at run-time
isn't helping you here...
> Basically, it's harder than it sounds.
> No ;-) The more I talk about this the more confident I feel that
> this is an easy, optimal solution.
How does the saying go? "Anything is easy for the man who does not have
to implement it..."
More information about the Haskell-Cafe