[Haskell] Implicit parallel functional programming

Thu Jan 20 04:40:16 EST 2005

I'd like to add my two cents worth in this debate...

I think the original poster considered the standard multicore processors
soon to come, and which can be expected to eventually overtake the processor
market. The answer relies a lot on what shape these processors will have:

A guess is that the first generation will support a shared memory model much
like SMP:s of today (shared main memory with on-chip cache(s), or some other
kind of local memory (-ies)). Here, I think implicit parallelism in
functional languages can be a win in some situations. This is provided
spawning of parallel threads is cheap and local memory can be efficiently
utilized (so control of granularity is no problem, and dynamic parallelism
is cheap). Cheap threads is likely to be true, efficient use of local memory
is a question mark.

However, some obstacles:

Laziness, if used strictly (sorry) gets in the way for parallelism. Waiting
to spawn a parallel thread until we surely know it's result is needed, if it
is anyway needed 99% of the time, is not an optimal strategy. I think
optimistic evaluation strategies a la Robert Ennals can be helpful here. But
designing them to work for new parallel architectures may be non-trivial.
I think there can be material for PhD theses here.

If the program implements an inherently sequential algorithm then
parallelism won't help anyway. The use of lists often leads to sequential
algorithms: a prime example is list folds, which all translate the linear
spine structure of a list into a sequential dependence graph of the
resulting computation. Folds over tree-like structures are better in this
regard.  The FP community will have a problem with legacy code using
inherently sequential computations over lists.  There has been quite some
research on how to parallellize list computations (skeletons, program
transformations using BMF,...). However, they typically rely on knowing the
length of the list. In a lazy language it is common that lists are evaluated
dynamically, and then it's hard to use these techniques efficiently. Again,
speculation may help.

In the long run, as hardware dimensions shrink even more and communication
becomes relatively even more expensive, it is likely that we will have some
kind of on-chip massively parallel, mesh-connected distributed memory
machine. This will most likely favor static parallelization techniques,
which means only for certain classes of functional programs an automatic
approach with implicit parallelism will work well.

Björn Lisper