[Haskell-cafe] Dynamic thread management?

Fri Aug 10 13:49:58 EDT 2007

Hugh Perkins wrote:
>
>     - parallelism must be quite coarse to offset overheads
>        (which I think is the problem with expecting things like map
>     and fold
>     to parallelised automagically; they're just too small grained for
>     it to
>     be worthwhile)
>
>
> Someone else said that.  I dont understand  what you mean.

If you do 1+2+3+4, it's not very efficient to spawn a new thread, have 
that compute 1+2, compute 3+4 in the current thread, synchronise, and do 
the final addition. You waste more time starting and stopping threads 
than doing useful work. If you're going to spawn a new thread, it needs 
to do "lots" of work for it to be worth it. And this is kinda what makes 
this stuff so tricky - between lazyness and the Halting Problem, it's 
hard to work out what is "worth" sparking a thread for...

> Imagine I have a map like this:
>
> map f [1..100000]
>
> ... and I have 64 cores..
>
> So I divide the 100000 elements in my list by 64, and give each chunk 
> of 1000-ish elements to each thread.  Easy I think?

...you realise that that's a linked-list you have there, right? ;-)

Now, if you had an *array*, and you know that "f" does a constant amount 
of work for all elements, and that all array elements will eventually be 
"needed"...

> Note that NDP is doing this too, *but NDP is trying to do this for 
> assymetric elements at compile-time, which is generally impossible*.  
> I propose doing it at runtime, inside the VM, which is trivial and 
> optimal.

Parallel and trivial in the same sentence? Hmm, I wonder... how did this 
problem not get solved 20 years ago? Oh, wait, maybe because it's harder 
than it looks. ;-)

It certainly looks *easier* to partition the work at run-time than 
compile-time. But trivial? I doubt it. (For a start, and autosensing 
magic you add mustn't slow the program down more than the parallelism 
it's supposed to be generating!)

> It's trivial to improve on NDP at runtime.  For example:
>
> - we have a map of 64 elements, so we give one element to each thread
> - all of the pieces but one finishes really quickly (say, within 1 
> second)
> - so, we start to look at what is happening within the remaining piece
> -> turns out it contains another map, so we split that map amongst our 
> idle threads
> -> and so on
>
> Easy, flexible.

Seems to require running some sort of "monitor" process which can 
somehow "see" what operation(s) a thread is trying to do so it can split 
up the work and share it around. That's going to add overhead. Now 
that's OK so long as the gains outweight the overhead...

>     - lazy evaluation makes it harder
>
>
> Not at runtime ;-)

O RLY?

The problem with things being lazy is that you don't know whether some 
data is "needed" until the moment you actually come to need it. OTOH, 
ideally you'd want to know way in advance so that you can compute it in 
parallel and it will already be "ready" when you come to use it. But 
with lazyness, you can't [easily] do that. Doing the resting at run-time 
isn't helping you here...

>     Basically, it's harder than it sounds.
>
>
> No ;-)   The more I talk about this the more confident I feel that 
> this is an easy, optimal solution.

How does the saying go? "Anything is easy for the man who does not have 
to implement it..."