[Haskell-cafe] GHC predictability
Jeff Polakow
jeff.polakow at db.com
Mon May 12 22:18:49 EDT 2008
Hello,
> I offer up the following example:
>
This is an instructive example.
> mean xs = sum xs / length xs
>
In order to type-check, I actually need to write something like:
mean xs = sum xs / fromIntegral (length xs)
There are other ways of get the numeric types to match correctly, but this
is fairly general.
Then, I immediately blow my stack if I try something like:
mean [1..1000000000].
The culprit is actually sum which is defined in the base libraries as
either a foldl or a direct recursion depending on a compiler flag. In
either case, the code is not strict enough; just trying to compute:
sum [1..10000000]
blows the stack. This can be easily fixed by defining a suitable strict
sum:
sum' = foldl' (+) 0
and now sum' has constant space. We could try to redefine mean using sum':
mean1 xs = sum' xs / fromIntegral (length xs)
but this still gobbles up memory. The reason is that xs is used twice and
cannot be discarded as it is generated. So we must move to a direct fold,
as you did, to get a space efficient mean.
> If we now rearrange this to
>
> mean = (\(s,n) -> s / n) . foldr (\x (s,n) -> let s' = s+x; n' = n+1
> in s' `seq` n' `seq` (s', n')) (0,0)
>
> and run the same example, and watch it run in constant space.
>
This code actually blows the stack on my machine just like the first naive
mean. Foldl is perhaps more intuitive to use here, since we are summing
the numbers as we encounter them while walking down the list, and there is
a strict version, foldl', provided in the base libraries.
mean2 = uncurry (/) . foldl' (\(s,n) x -> (s+x, n+1)) (0,0)
However, this still gobbles up memory... the reason is that pairs are
lazy. So we need a way to force the (s+x) and (n+1). An easy, and
unobtrusive way to do this is to use strict pattern matching:
mean2 = uncurry (/) . foldl' (\(!s, !n) x -> (s+x, n+1)) (0,0)
Now we can run:
mean2 [1..1000000000]
in constant space.
While using an explicit foldl is less readable than the direct version
(especially to a beginner), it is a standard functional idiom.
Furthermore, a good understanding of lazy evaluation should be enough to
guide you to using the strict foldl' and then then strict patterns. Is
this a reasonable analysis?
Also, we've made no attempt to address speed. However, I would argue that
the code's performance time is predictable-- it grows linearly with the
size of the list. Improving the performance time is another matter that
requires knowing about the internal representation of the various types
being used.
-Jeff
---
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20080512/bebb8e94/attachment.htm
More information about the Haskell-Cafe
mailing list