[Haskell-cafe] Two-iteration optimisation (was: GHC Predictability)

Wed May 14 13:20:45 EDT 2008

paul:
> We've had a big debate over
> 
> > mean xs = foldl' (+) 0 xs / fromIntegral (length xs)
> 
> For anyone who didn't follow it, the problem is that "mean" needs to 
> traverse its argument twice, so the entire list has to be held in 
> memory.  So if "xs = [1..1000000000]" then "mean xs" uses all your 
> memory, but "foldl' (+) 0 xs" and "length xs" by themselves do not.
> 
> The solution is for the programmer to rewrite "mean" to accumulate a 
> pair containing the running total and count together, then do the 
> division.  This makes me wonder: could there be a compiler optimisation 
> rule for this, collapsing two iterations over a list into one.  I've 
> never tried writing GHC rules, but something like:
> 
> >  f (foldl' g x) (foldl' h x) = (uncurry f) (foldl' (\i (gt,ht) -> (g 
> i gt, h i ht)))
> 
> The first problem that occurs to me is the number of permutations of 
> fold and map functions.

You'd want a general fusion framework for this, and other accumulators,
that's let's you treat 'f' as a zip of some kind that pulls items from
its two arguments. And then require only a single rewrite rule for all
your loop forms.

Stream fusion (see "Lists to Streams to Nothing at All") at least does this for zips,

    zip (foldl .. )
        (foldl .. )

Will end up with a single loop, but for an arbitrary 'f' instead of zip,
seems harder.

-- Don