[Haskell-cafe] Ocaml for Haskellers tutorial

Sat Apr 17 08:15:39 EDT 2010

On Sat, Apr 17, 2010 at 8:22 AM, Jason Dagit <dagit at codersbase.com> wrote:
> ...
> One place where lazy accumulators is bad are the left folds.  There is the
> lazy foldl and the version which is strict in the accumulator, foldl'.  Try
> summing big lists of integers, let's use ghci and limit the heap to 1 meg:

> ghci +RTS -M1M
> Prelude> foldl (+) 0 [1..10000000]
> Heap exhausted;
> Current maximum heap size is 6291456 bytes (6 MB);
> use `+RTS -M<size>' to increase it.

If we define foldl as:

foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z []     = z
foldl f z (x:xs) = let z' = f z x
                   in foldl f z' xs

and:

sum :: [Int] -> Int
sum = foldl (+) 0

Then the program: 'sum 1000000' will indeed use a lot of heap space
(and will run out of it if you limit the heap to 1MB). The reason for
this is explained in [1]. In short: foldl will start allocating thunks
on your heap which each add an element of the list to a previous thunk
as in:

let z1 =  0 + 1
    z2 = z1 + 2
    z3 = z2 + 3
    z4 = z3 + 4
    ...
    z999997 = z999996 + 999997
    z999998 = z999997 + 999998
    z999999 = z999998 + 999999
    z100000 = z999999 + 1000000
in z1000000

This can be visualized if we generate a heap profile:

$ ghc --make FoldlProfile.hs -o foldlProfile -O2 -prof
$ ./foldlProfile 1000000 +RTS -hy
Stack space overflow: current size 8388608 bytes.
$ hp2ps -c foldlProfile.hp
$ ps2pdf foldlProfile.ps

Result: http://bifunctor.homelinux.net/~bas/foldlProfile.pdf

You clearly see that this program allocates well over 22MB of heap
space! This is not a problem if you can give the program a big enough
heap. However have you noticed the more serious problem?

The problem starts when we finally evaluate z1000000:

Note that z1000000 = z999999 + 1000000. So 1000000 is pushed on the
stack. Then z999999 is evaluated.

Note that z999999 = z999998 + 999999. So 999999 is pushed on the
stack. Then z999998 is evaluated:

Note that z999998 = z999997 + 999998. So 999998 is pushed on the
stack. Then z999997 is evaluated: So ...

...your limited stack will eventually run full when you evaluate a
large enough chain of (+)s. This then triggers a stack overflow
exception!

Interestingly, when we define foldl as:

foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f = foldl_f
    where
      foldl_f z []     = z
      foldl_f z (x:xs) = let z' = f z x
                         in foldl_f z' xs

then both the heap and stack overflow problems go away.

(this is how foldl is actually implemented[2] in GHC)

I don't know why the heap and stack overflow problems go away. So lets
look at the core output of the latter program:

$ ghc-core -- -O2 FoldlProfile.hs

$wsum :: [Int] -> Int#
$wsum = \ (w_s1rS :: [Int]) -> $wfoldl_f 0 w_s1rS

$wfoldl_f :: Int# -> [Int] -> Int#
$wfoldl_f =
  \ (ww_s1rK :: Int#) (w_s1rM :: [Int]) ->
    case w_s1rM of _ {
      [] -> ww_s1rK;
      : x_aeb xs_aec ->
        case x_aeb of _ { I# y_aUb ->
        $wfoldl_f (+# ww_s1rK y_aUb) xs_aec
        }
    }

Apparently, because of the latter foldl definition, GHC is able to
optimize the foldl for uboxed ints.

But why doesn't the recursive call:

 $wfoldl_f (+# ww_s1rK y_aUb) xs_aec

allocate (+# ww_s1rK y_aUb) on the heap? Are unboxed values always
evaluated strictly?

For reference, this is the core output of the former foldl:

sum :: [Int] -> Int
sum = foldl @ Int @ Int plusInt main3

main3 = I# 0

foldl :: forall b_aer a_aes.
              (b_aer -> a_aes -> b_aer) -> b_aer -> [a_aes] -> b_aer
foldl =
  \ (@ b_afF)
    (@ a_afG)
    (f_aet :: b_afF -> a_afG -> b_afF)
    (z_aeu :: b_afF)
    (ds_drS :: [a_afG]) ->
    case ds_drS of _ {
      [] -> z_aeu;
      : x_aex xs_aey ->
        foldl @ b_afF @ a_afG f_aet (f_aet z_aeu x_aex) xs_aey
    }

regards,

Bas

[1] http://haskell.org/haskellwiki/Foldr_Foldl_Foldl'
[2] http://haskell.org/ghc/docs/latest/html/libraries/base-4.2.0.0/src/GHC-List.html#foldl