[Haskell-cafe] Two-iteration optimisation (was: GHC
Predictability)
Don Stewart
dons at galois.com
Wed May 14 13:20:45 EDT 2008
paul:
> We've had a big debate over
>
> > mean xs = foldl' (+) 0 xs / fromIntegral (length xs)
>
> For anyone who didn't follow it, the problem is that "mean" needs to
> traverse its argument twice, so the entire list has to be held in
> memory. So if "xs = [1..1000000000]" then "mean xs" uses all your
> memory, but "foldl' (+) 0 xs" and "length xs" by themselves do not.
>
> The solution is for the programmer to rewrite "mean" to accumulate a
> pair containing the running total and count together, then do the
> division. This makes me wonder: could there be a compiler optimisation
> rule for this, collapsing two iterations over a list into one. I've
> never tried writing GHC rules, but something like:
>
> > f (foldl' g x) (foldl' h x) = (uncurry f) (foldl' (\i (gt,ht) -> (g
> i gt, h i ht)))
>
> The first problem that occurs to me is the number of permutations of
> fold and map functions.
You'd want a general fusion framework for this, and other accumulators,
that's let's you treat 'f' as a zip of some kind that pulls items from
its two arguments. And then require only a single rewrite rule for all
your loop forms.
Stream fusion (see "Lists to Streams to Nothing at All") at least does this for zips,
zip (foldl .. )
(foldl .. )
Will end up with a single loop, but for an arbitrary 'f' instead of zip,
seems harder.
-- Don
More information about the Haskell-Cafe
mailing list