[Haskell-cafe] Safer common subexpression elimination

Sat Dec 4 12:41:12 CET 2010

Joachim Breitner schrieb:
> Hi,
> 
> although semantically it could, ghc does not do common subexpression
> elimination (CSE), at least not in every case. The canonical example
> where it would be bad is the function (with div to avoid numerical
> casting):
> 
>> avg :: [Int] -> Int
>> avg n = (sum [0..n] `div` length [0..n])
> 
> The reason why [0..n] is not shared is because then it would be kept in
> memory completely after the evaluation of sum, because length will still
> need it, possibly overflowing the memory. In the non-shared form,
> though, the garbage collector can clean up directly behind sum and
> length, and the program runs in constant space.
> 
> Note that the shared expression would be very large compared to the
> thunks.
> 
> Now consider the program:
> 
>> avg' :: [Int] -> (Int, Int)
>> avg' n = (sum [0..n] `div` length [0..n], length [0..n])

It think this is not typecorrect, since 'n' denotes a list and is used
as upper bound in [0..n]. Should it be

> avg' ns = (sum ns `div` length ns, length ns)

?

> I’d say that CSE to 
>> avg n = (sum [0..n] `div` l, l)
>>   where l = length [0..n]
> is safe, because the shared expression (an Int) is smaller than the
> thunks and the compiler can tell.

If this is meant to be

> avg n = (sum ns `div` l, l)
>   where l = length ns

according to my substitution above, then 'ns' still has to be shared and
stored completely.

I think the general problem is more complicated than common
subexpression elimination, because lazy evaluation sometimes forces an
inefficient order of computation. Other examples are

> viewR xs = (init xs, last xs)

and an elegant Haskell implementation of the Unix command 'wc' like

> wc text = (length text, length (words text), length (lines text))