[Haskell-cafe] optimising for vector units

Mon Jul 26 15:20:08 EDT 2004

Ketil Malde wrote:
> I'm sure somebody, somewhere, is working on speculative execution of
> Haskell code.  

I think we both graduated (myself from MIT, Robert Ennals from 
Cambridge; I'm building compilers for supercomputers at Sun).  If 
anyone else is working seriously on speculative evaluation of Haskell 
at the moment, please drop me a line, I'd love to hear of your existence.

 > Now that Sun is building Niagara with IIRC 8 cores on a
> chip, Intel is rumoured to put up to 16 cores on the post-Montecito
> Tukwila, Sony/IBM's Cell is said to be some kind of parallel design,
> and every self-respecting chip vendor is putting at least two cores on
> a chip and/or adding multithreaded cores.
> 
> One would expect a lazy and pure language to be excellent for
> parallelization, since the programmer is generally removed from the
> actual flow of execution anyway.  

As I recall, this was one of the original goals of the Grip project 
(which built the original incarnation of GHC, if I have my history 
straight).

Indeed, if you read any early paper on strictness analysis, you'd 
think that it was all about parallelizing lazy code.  I'm now of the 
opinion that strictness analysis tells us where to *serialize* our 
code, because parallelism just won't be worth it.

There are, I believe, a couple of major challenges:
   * It's easy to identify very small pieces of parallel work, but much
     harder to identify large, yet finite, pieces of work.  Only the
     latter are really worth parallelizing.
   * If you don't compute speculatively, you'll never find enough work
     to do.
   * If you compute speculatively, you need some way to *stop* working
     on useless, yet infinite computations.  Rob and I had two rather
     different takes on the same basic idea: run with some sort of
     resource limits, and stop working when you exhaust them.  Roughly
     speaking, Rob then made sure you never speculated that bit of code
     again, while I just kept relying on the bounds to kick in.  On
     pretty eager code, I did all right, but on code that actually
     needed the laziness (this was after a good bit of compiler munging
     to wring out most of the unnecessary laziness) I got
     killed---Rob's approach fixed the mistake by patching the code,
     and did much better thereafter.
My ultimate goal was parallelization.  However, I'm now convinced it 
would take about 2-3 programmer-years more of effort to realize that 
goal.  Parallel garbage collection (which is *not* optional on a 
parallel Haskell implementation, as our experience with pH 
demonstrated) and very fine-grained thunk-update protocols are both 
tricky technical challenges.  And that leaves aside the question of 
whether there's enough coarse-grained work buried among all that 
fine-grained work to make it all worthwhile.  Anyone want to pay me to 
do this? :-)

 > At some point (for some n), being
> able to spawn n threads will gain you more than a factor c constant
> overhead, and Haskell programs, with a run-time system that can
> evaluate expressions in paralllel, will outperform single threaded C
> code. 

Only if you can keep the granularity of the work large.  This hasn't 
been so easy.

-Jan-Willem Maessen
Eager Haskell Implementor