[Haskell-cafe] optimising for vector units
jcast at ou.edu
Mon Jul 26 12:22:05 EDT 2004
MR K P SCHUPKE <k.schupke at imperial.ac.uk> wrote:
> As far as I understand it haskells lazy evaluation resembles dataflow
> computation - IE values are computed on demand.
> This problem has already been solved for hardware, you end up with
> what is termed a super-scalar architecture.
> Instructions read from memory fill a re-order buffer. As computations
> complete value slots in the reorder buffer are filled with the result
> values. When an instruction has all its values it can be executed, and
> will be assigned to the next available execution unit.
> I am not too sure of the internal architecture of GHC, but I can
> imagine that funtions waiting to be run will have slots for arguments,
> that will be filled in when values become available.
No. A function may not use all of its arguments, so there's no point in
evaluating all arguments---you have to find out which ones are actually
needed. Terminology note: GHC uses the STG language internally. In
STG, evaluation is always triggered by a case expression, which
evaluates precisely one expression (not necessarily an argument of
> The whole point is that a function with three parameters can spark
> three threads
This is actually a bad idea: the point of non-strict languages and lazy
evaluation is to allow lots of values which are (a) unused and (b)
expensive to evaluate. If you evaluate everything (even in parallel),
there's a lot of wasted work if the programmer is making optimal use of
the non-strictness of the language.
Furthermore, if de-forestation fails anywhere, you've likely got another
problem. Consider the following program:
> n `choose` m = product $ take m $ [n, n-1..]
Which translates (after some optimisation) to STG equivalent to:
> downFrom n = let n1 = (-) n 1
> in let xn1 = downFrom n1
> in let xn2 = n:xn1
> in xn2
> choose n m = let xn1 = downFrom n
> in let xn2 = take m xn1
> in product xn2
Now, (:) cannot `spark' a thread to evaluate xn1 in downFrom, because
that would end up sparking infinitely many threads---which may be
possible, but nothing you've said suggests that it is.
> to calculate each parameter without worrying about the side affects of
> each function. No locking is necessary because you only spark one
> thread for each argument.
This is true for tree reduction, but not for graph reduction. Consider,
e.g., the fact that the call to product in choose and the recursive call
to downFrom walk down (and hence evaluate) the same list. Since the
cons cells making up this list are evaluated by two different functions,
some sort of locking is indeed necessary to ensure that you don't end up
with two threads for each of the (infinitely many) cons cells in the
> I suspect I haven't fully understood the difficaulty in doing this,
> care to enlighten me?
Certainly. See above.
More information about the Haskell-Cafe