[Haskell] Implicit parallel functional programming

Thu Jan 20 10:31:02 EST 2005

Bjorn Lisper wrote:

>It depends on what you compare with. Multicore CPU:s will probably have
>cores that are simpler than current processor cores, which means you will
>want to have some parallelism. Cf. a superscalar processor, which really in
>a sense is a parallel machine but where you add some complex control
>hardware to make it emulate a sequential machine. Running a multicore CPU on
>a single core at a time corresponds to running a superscalar machine where
>you use none of the superscalarity. Or, running a pipelined CPU (another
>form of parallelism) without utilizing any of the pipelining. This is not
>what you want.
>  
>
Well, intel have 2 core now, 4 core soon. AMD have 2 core now...
IBM have 8 core Power5... These are full spec superscalar cores.

When I am thinking about exploiting parallelism, I am thinking about
2/4 cpu (4/8 with hyperthreading) Intel boxes, maybe 8/16 cpu AMD
Opteron boxes, or 8 way Power5 boxes.

>Sure you can get some overlap, which can be quite big if the task to perform
>per list element is big, but a dependence graph with O(n) nodes will still
>have O(n) depth => at maximum constant speedup.
>  
>
A constant speedup per CPU. Imagine a loop:

    loop [] = 0
    loop (a:b) = length a + loop b

we can run as many "length" calculations in parallel as we have CPUs.
(Here length may take a significant time as a could be long)

>I'm not talking about the next generation but further ahead. Maybe these
>architectures will have some kind of on-chip shared cache, who knows, but
>that memory will then still be much more expensive to access than the memory
>in a local core. So you will want to use the core-local memory all the time,
>and only communicate between processor cores which are quite close on the
>chip.
>  
>
One step at a time, are you suggesting we fail to take advantage of
architectures that are on the horizon, because things may be different
in the far future...

>No, I mean automatic techniques to detect the parallelism statically (and
>schedule and allocate work statically), which only will work well for
>restricted classes of programs.
>
>  
>
Well this is how I imagine things happening as well. As do as much static
analysis as you can at compile time. You end up with a lot of dependancies
resolved, but a few will be unknown until runtime (due to IO for example)...
These remaining few are the ones that you might want to try speculative
execution on... If you have the spare CPU cycles (and the machine would
otherwise be idle).

    Keean.