[Haskell-cafe] Dynamic thread management?

Thu Aug 23 01:27:43 EDT 2007

On 8/22/07, Brandon Michael Moore <brandon at heave.ugcs.caltech.edu> wrote:
> Automatic threading is inherently limited by data dependencies.
> You can't run a function that branches on an argument in parallel
> with the computation producing that argument. Even with arbitrarily
> many processors and no communication overhead there is a limit to
> how much parallelism you can squeeze from any given program.

Yes.  Its worse than that in fact, because many real-world problems
will involve functions that have side-effects, but we know the
side-effects dont matter.  The parallelisation algo of course doesnt
know they dont matter (unless we tell it).

Example: imagine we want to copy files from one machine to five
others.  Copying a file has a clear side-effect, but since we're
copying to 5 independent machines, we can copy to each machine in
parallel.  The algo doesnt know that this is ok.

>
> You should read
> "Feedback Directed Implicit Parallelism"
> http://research.microsoft.com/~tharris/papers/2007-fdip.pdf
> and perhaps
> "Limits to Implicit Parallelism in Functional Applications"
> http://www.detreville.org/papers/Limits.pdf

Ok

> In short, with zero overhead and an oracle for scheduling you can
> get a factor of at most 2x to 32x by implicitly parallelizing
> existing Haskell code. In practice, with execution overhead it's a
> gain of perhaps 10% to 80% on a 4-core system.

This is a good argument that it's not enough to prototype on a 4 core
system, but we really need some way to simulate a 1024 core system to
carry out meaningful benchmarks.

>
> You can do a lot better if you expect people to rewrite code,
> but "automatic threading" suggests something completely effortless.

Yes, I tend to overuse the word "automatic" ;-)

> I think you can get much better results if you work on the programming
> style in connection with a better runtime.  You can think of data parallel
> Haskell as a new programming style with more implicit parallelims,
> and the runtime support to exploit it.

Yes, you're right.

> If you have cores to waste, you might try rewrites like
>
> f x
> =>
> case x of
>  C1 a1 a2 -> f (C1 a1 a2)
>  C2 b -> f (C2 b)
>  C3 -> f C3
>
> and then speculatively execute several of the case branches.
> If you don't throw away too much type information you should
> even be able to do it at runtime.

That's a good observation.  Sortof anti-laziness :-D