[Haskell-cafe] Dynamic thread management?
bhurt at spnz.org
Sat Aug 11 12:35:49 EDT 2007
You guys might also want to take a look at the Cilk programming language,
and how it managed threads. If you know C, learning Cilk is about 2 hours
of work, as it's C with half a dozen extra keywords and a few new
concepts. I'd love to see Cilk - C + Haskell as a programming language.
The key idea of Cilk is that it's easier to deparallelize than it is to
parallelize, especially automatically. So the idea is that the program is
written incredibly parallel, with huge numbers of microthreads, which are
(on average) very cheap to spawn. The runtime then deparallelizes the
microthreads, running multiple microthreads sequentially within a single
real thread (a worker thread). Microthreads that live their entire life
within a single real thread are cheap to spawn (as in "not much more
expensive than a normal function call" cheap).
The problem that Cilk runs into is that it's, well, C. It doesn't deal
with contention at well at all- a single microthread blocking blocks the
whole worker thread- meaning, among other things, that you can have "false
deadlocks", where one microthread blocks on another microthread in the
same real thread, and thus acts like it's deadlocked even though it really
isn't. You have greatly increased the likelyhood of raceconditions as
well (mutable data and multithreading just don't mix). Plus you have all
the normal fun you have with C bugs- wild pointers, buffer over runs, etc.
All of which you avoid if you replace the C aspects of Cilk with Haskell.
With STM you avoid the deadlock problem entirely, and with immutable data
(except for carefully coded monads) you avoid the whole race condition
problem. Plus all the normal advantages Haskell has over C.
Just my $0.02.
On Sat, 11 Aug 2007, Neil Bartlett wrote:
> I certainly think it would be wrong to declare that NDP is doomed to
> failure... not because you would be making an enemy of SPJ (I'm pretty
> sure you wouldn't!) but because it actually aims to solves a less
> ambitious problem: the problem of parallelising the SAME task applied
> to different data, rather than a collection of arbitrary tasks.
> Because the task does not change, we know that e.g. taking half the
> data cuts the amount of work in half. Therefore an up-front scheduling
> approach can work.
> If you fast forward to about 42 minutes into the London HUG video, you
> see that Simon talks about the problem of parallelizing (f x) + (g y),
> and says that he spent "quite a lot of time in the eighties trying to
> make this game go fast [..] and the result was pretty much abject
> You're absolutely right that a dynamic/adaptive approach is the only
> one that will work when the tasks are of unknown size. Whether this
> approach is as easy as you think is open for you to prove. I look
> forward to testing your VM implementation, or at the very least
> reading your paper on the subject ;-)
> On 8/11/07, Hugh Perkins <hughperkins at gmail.com> wrote:
>> On 8/11/07, Thomas Conway <drtomc at gmail.com> wrote:
>>> There are many papers about this in the Parallel Logic Programming
>>> area. It is commonly called "Embarrassing Parallelism".
>> Ah, I wasnt very precise ;-) I didnt mean I dont understand the
>> problem; I meant I dont understand why people think it is difficult to
>> solve ;-) (And then I tried to explain by examples why it is easy,
>> but it is true that my communication sucks ;-) )
>>> you'll only get a benefit if you can break xs into chunks
>> of e.g. 10^3 elements or more, and more like 10^5 or more for more
>> usual 'f's.
>> Actually, the number of elements is irrelevant. The only measure that
>> is important is how long the function is taking to execute, and
>> whether it is parellizable.
>> Example, the following only has 3 elements but will take a while to run:
>> strPrtLn $ sum $ map getNumberPrimes [10240000, 20480000, 40960000 ]
>> The following has 10 million elements but runs quickly:
>> strPrtLn $ sum $ map (+1) [1..10000000 ]
>> In the second, we start the function running, in a single thread, and
>> after a second, the function has already finished, so great! Were
>> In the first, we start the function running, in a single thread.
>> After a second the function is still running, so we look at what is
>> taking the time and whether it is parallelizable.
>> Turns out the vm has been chugging away on the map for the last
>> second, and that that maps are parallelizeable, so we split the map
>> into Math.Min( <numberelements>, <number cores>) pieces, which on a
>> 1024-core machine, given we have 3 elements, is Math.Min( 1024, 3 ),
>> which is 3.
>> So, we assign each of the 3 elements of the map to a thread. So, now
>> we're using 3 of our 64 cores.
>> A second later, each thread is still chugging away at its element, so
>> we think, ok, maybe we can parallelize more, because we still have 61
>> threads/cores available, so now we look at the getNumberOfPrimes
>> function itself, and continue the above type of analysis.
>> This continues until all 64 cores have been assigned some work to do.
>> Whenever a thread finishes, we look around for some work for it to do.
>> If there is some, we assign it. If there isnt, we look around for an
>> existing thread that is doing something parallelizeable, and split it
>> into two pieces, giving one to the old thread, and one to the
>> available thread.
>> Not sure why this is perceived as difficult (added the word
>> "perceived" this time ;-) ). I think the main barrier to
>> understanding why it is easy is understanding that this needs to be
>> done from a VM, at runtime. It is not possible to do it statically at
>> compilation, but I really need to finish watching SPJ's video before
>> declaring that SPJ's proposal is doomed to fail ;-) Not least,
>> probably not good to make an enemy of SPJ ;-)
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
More information about the Haskell-Cafe