[Haskell] Implicit parallel functional programming

Thu Jan 20 05:17:34 EST 2005

Bjorn Lisper wrote:

>A guess is that the first generation will support a shared memory model much
>like SMP:s of today (shared main memory with on-chip cache(s), or some other
>kind of local memory (-ies)). Here, I think implicit parallelism in
>functional languages can be a win in some situations. This is provided
>spawning of parallel threads is cheap and local memory can be efficiently
>utilized (so control of granularity is no problem, and dynamic parallelism
>is cheap). Cheap threads is likely to be true, efficient use of local memory
>is a question mark.
>  
>
You only need one thread per CPU-core, so the spawning threads cost is
a red-herring. Just like GHC schedules haskell-threads from a single
OS-thread.

>However, some obstacles:
>
>Laziness, if used strictly (sorry) gets in the way for parallelism. Waiting
>to spawn a parallel thread until we surely know it's result is needed, if it
>is anyway needed 99% of the time, is not an optimal strategy. I think
>optimistic evaluation strategies a la Robert Ennals can be helpful here. But
>designing them to work for new parallel architectures may be non-trivial.
>I think there can be material for PhD theses here.
>  
>
I suggest speculative execution, as used by current CPUs to take advantage
of multiple microcode execution units. If the a CPU has spare capacity, 
functions
whose results might be needed are run. As soon as you know a result is not
needed the thread can be killed, and the time used for something else. A lot
of work has already been done on hardware superscalar-architectures, and
a lot of this should be applicable to the software case.

>If the program implements an inherently sequential algorithm then
>parallelism won't help anyway. The use of lists often leads to sequential
>algorithms: a prime example is list folds, which all translate the linear
>spine structure of a list into a sequential dependence graph of the
>resulting computation. Folds over tree-like structures are better in this
>regard.  The FP community will have a problem with legacy code using
>inherently sequential computations over lists.  There has been quite some
>research on how to parallellize list computations (skeletons, program
>transformations using BMF,...). However, they typically rely on knowing the
>length of the list. In a lazy language it is common that lists are evaluated
>dynamically, and then it's hard to use these techniques efficiently. Again,
>speculation may help.
>  
>
But in the worst case its just a sequential computation, so any gain from
parallelism is still a gain... also the dependant part of a loop is often
separate from an independant part... For example consider a list of 
strings, and
taking the length of each string. The next iteration does not have to wait
for the length to be computed, it can start the next length computation as
soon as the pointer for the next list cell has been dereferenced.

>In the long run, as hardware dimensions shrink even more and communication
>becomes relatively even more expensive, it is likely that we will have some
>kind of on-chip massively parallel, mesh-connected distributed memory
>machine. This will most likely favor static parallelization techniques,
>which means only for certain classes of functional programs an automatic
>approach with implicit parallelism will work well.
>  
>
The next generation of CPUs seem to be using a shared cache architecture,
so overheads are extreemely small...

Erm, by static I assume you mean programmer specified parallelism. I would
tend to think of static parallelization as something that takes place at 
compile
time (and would still count as implicit).

    Keean.