[Haskell-beginners] Stack space overflow with foldl'

Fri Sep 10 10:20:40 EDT 2010

On Friday 10 September 2010 11:29:56, Ryan Prichard wrote:
> Hi,
>
> I see a stack overflow with this code, but I don't understand why.

Neither do I, really, but ghc is being too clever for its own good - or not 
clever enough.

>
> I looked at the Core output with ghc-core, with or without
> optimizations, and I see a $wlgo recursive function that doesn't appear
> to end in a tail call.

Without optimisations, I see a nice tail-recursive lgo inside foldl'2, 
pretty much what one would like to see.
With optimisations, you get a specialised worker $wlgo:

Rec {
Main.$wlgo :: [GHC.Types.Int] -> (##)
GblId
[Arity 1
 NoCafRefs
 Str: DmdType S]
Main.$wlgo =
  \ (w_sms :: [GHC.Types.Int]) ->
    case case w_sms of _ {
           [] -> GHC.Unit.();
           : x_adq xs_adr ->
             case x_adq of _ { GHC.Types.I# _ ->
             case Main.$wlgo xs_adr of _ { (# #) -> GHC.Unit.() }
             }
         }
    of _ { () ->
    GHC.Prim.(##)
    }
end Rec }

and it's here that ghc shows the wrong amount of cleverness.

What have we? At the types () and [Int], with f = flip seq, the step in lgo 
unfolds to

lgo z (x:xs)
~> let z' = f z x in lgo z' xs
~> case f z x of
      z'@() -> lgo z' xs
~> case (case x of { I# _ -> z }) of
      z'@() -> lgo z' xs

Now flip seq returns its first argument unless its second argument is _|_ 
and () has only one non-bottom value, so the ()-argument of lgo can be 
eliminated (here, the initial ()-argument is known to be (), in general the 
wrapper checks for _|_ before entering the worker), giving

wlgo :: [Int] -> ()
wlgo [] = ()
wlgo (x:xs) =
  case (case x of { I# _ -> () }) of
    () -> wlgo xs

It would be nice if the compiler rewrote the last equation to

wlgo (x:xs) -> case x of { I# _ -> wlgo xs }

, but apparently it can't. Still, that's pretty good code.
Now comes the misplaced cleverness.
Working with unboxed types is better (faster) than working with boxed 
types, so let's use unboxed types, giving $wlgo the type

[Int] -> (##)

(unboxed 0-tuple). But it wasn't clever enough to directly return (# #) in 
the case of an empty list - that would've allowed the core to be

 \ (w_sms :: [GHC.Types.Int]) ->
    case w_sms of _ {
      [] -> GHC.Prim.(##)
      : x_adq xs_adr ->
        case x_adq of _ { GHC.Types.I# _ ->
          Main.$wlgo xs_adr }
    }

and all would've been fine and dandy. So it chose [] -> GHC.Unit.() and 
that forced the second branch to also have that type, hence you can't have 
a tail call there but have to box the result of the recursive call (only to 
unbox it again).
So you get superfluous boxing, unboxing, reboxing in addition to a stack-
eating recursion.

But you've hit a rare case here, if you use foldl'2 (flip seq) at a type 
with more than one non-bottom value, the first argument of lgo is not 
eliminated and you don't get the boxing-unboxing dance, instead you get a 
nice tail-recursion, even if foldl'2 is used only once and not exported.

Why GHC does that for (), beats me.

> I don't see any let expressions in the
> folding code, so I assume no thunks are being created.  I can make a
> tail call appear by doing either of two things:
>
> 1. Replace "lgo z []" with "lgo !z []".  This suggestion came from an
> email on haskell-beginners that I can't find right now.  It was a few
> months ago.

Perhaps http://www.haskell.org/pipermail/beginners/2010-August/005016.html 
?

>
> 2. Instead of using the __GLASGOW_HASKELL__ version of foldl', use the
> other version:
>
> foldl' f a []     = a
> foldl' f a (x:xs) = let a' = f a x in a' `seq` foldl' f a' xs

But that needs to pass also the function, which is generally slower than 
having it as a static argument.

3. {-# NOINLINE foldl'2 #-}

But that's not so good for performance in general.

Option 1 gives the best Core, but it changes behaviour, with that,

foldl' (\_ _ -> 1) undefined [0] = 1
foldl'2 (\_ _ -> 1) undefined [0] = _|_

To retain the behaviour of foldl',

foldl'2 f z0 xs0 =
    case xs0 of
      [] -> z0
      (x:xs) -> lgo (f z0 x) xs
  where
    lgo !z [] = z
    lgo z (y:ys) = lgo (f z y) ys

>
> My test case is contrived.  Originally, I had a program that read
> lines from a file as Data.ByteString values, and I was trying to debug
> a stack overflow.  I added the foldl' call to force the evaluation of
> the ByteString lines, but the foldl' call itself overflowed the
> stack.

That's probably a different matter, foldl' evaluates the accumulation 
parameter only to weak head normal form, if it's not of simple type, it can 
still contain thunks that will overflow the stack when demanded.

>
> I might have fixed my original stack overflow problem.  I was applying
> sum to a large list of integers, and sum is lazy.

For [Int] and [Integer], sum is specialised, so when compiled with 
optimisations, ghc should use a strict version for those types.

> I don't think I have
> any real code anymore that overflows the stack, but I'm uncomfortable
> because I don't know why my test case doesn't work.
>
> Is the foldl' call in my test case allowed to have linear stack usage?

I don't think the language definition treats that, so technically it's 
allowed then. But it shouldn't happen.

>
> Thanks,
> -Ryan