[Haskell-cafe] Two-iteration optimisation (was: GHC
Predictability)
Spencer Janssen
sjanssen at cse.unl.edu
Mon May 19 03:22:58 EDT 2008
On Wed, May 14, 2008 at 06:08:28PM +0100, Paul Johnson wrote:
> We've had a big debate over
>
> > mean xs = foldl' (+) 0 xs / fromIntegral (length xs)
>
> For anyone who didn't follow it, the problem is that "mean" needs to
> traverse its argument twice, so the entire list has to be held in
> memory. So if "xs = [1..1000000000]" then "mean xs" uses all your
> memory, but "foldl' (+) 0 xs" and "length xs" by themselves do not.
>
> The solution is for the programmer to rewrite "mean" to accumulate a
> pair containing the running total and count together, then do the
> division. This makes me wonder: could there be a compiler optimisation
> rule for this, collapsing two iterations over a list into one. I've
> never tried writing GHC rules, but something like:
>
> > f (foldl' g x) (foldl' h x) = (uncurry f) (foldl' (\i (gt,ht) -> (g i
> gt, h i ht)))
>
> The first problem that occurs to me is the number of permutations of
> fold and map functions.
>
> Paul.
There are two problems with this rule. The first is that the function is not
strict in 'gt' and 'ht' -- this is easily fixed with a little bit of seq. The
other problem is that 'f' must be strict in both parameters for this rule to
hold. As far as I know, there is no access to strictness information in rule
pragmas.
Cheers,
Spencer Janssen
More information about the Haskell-Cafe
mailing list