[Haskell-beginners] Stack space overflow with foldl'

Ryan Prichard ryan.prichard at gmail.com
Sat Sep 11 17:30:50 EDT 2010


Daniel,

Thanks for your help.

On Fri, 10 Sep 2010 16:20:40 +0200
Daniel Fischer <daniel.is.fischer at web.de> wrote:

> On Friday 10 September 2010 11:29:56, Ryan Prichard wrote:
> > Hi,
> >
> > I see a stack overflow with this code, but I don't understand why.
> 
> Neither do I, really, but ghc is being too clever for its own good -
> or not clever enough.
> 
> >
> > I looked at the Core output with ghc-core, with or without
> > optimizations, and I see a $wlgo recursive function that doesn't
> > appear to end in a tail call.
> 
> Without optimisations, I see a nice tail-recursive lgo inside
> foldl'2, pretty much what one would like to see.

When I ran ghc-core, I used the command-line "ghc-core -- Test.hs".  I
thought it would compile Test.hs without optimizations, but ghc-core
actually has a default string of ghc arguments that includes -O2.  If I
run "ghc-core -- -O0 Test.hs", I also see a nice tail-recursive lgo.

> With optimisations, you get a specialised worker $wlgo:
> 
> Rec {
> Main.$wlgo :: [GHC.Types.Int] -> (##)
> GblId
> [Arity 1
>  NoCafRefs
>  Str: DmdType S]
> Main.$wlgo =
>   \ (w_sms :: [GHC.Types.Int]) ->
>     case case w_sms of _ {
>            [] -> GHC.Unit.();
>            : x_adq xs_adr ->
>              case x_adq of _ { GHC.Types.I# _ ->
>              case Main.$wlgo xs_adr of _ { (# #) -> GHC.Unit.() }
>              }
>          }
>     of _ { () ->
>     GHC.Prim.(##)
>     }
> end Rec }
> 
> and it's here that ghc shows the wrong amount of cleverness.
> 
> What have we? At the types () and [Int], with f = flip seq, the step
> in lgo unfolds to
> 
> lgo z (x:xs)
> ~> let z' = f z x in lgo z' xs
> ~> case f z x of
>       z'@() -> lgo z' xs
> ~> case (case x of { I# _ -> z }) of
>       z'@() -> lgo z' xs
> 
> Now flip seq returns its first argument unless its second argument is
> _|_ and () has only one non-bottom value, so the ()-argument of lgo
> can be eliminated (here, the initial ()-argument is known to be (),
> in general the wrapper checks for _|_ before entering the worker),
> giving
> 
> wlgo :: [Int] -> ()
> wlgo [] = ()
> wlgo (x:xs) =
>   case (case x of { I# _ -> () }) of
>     () -> wlgo xs
> 
> It would be nice if the compiler rewrote the last equation to
> 
> wlgo (x:xs) -> case x of { I# _ -> wlgo xs }
> 
> , but apparently it can't. Still, that's pretty good code.
> Now comes the misplaced cleverness.
> Working with unboxed types is better (faster) than working with boxed 
> types, so let's use unboxed types, giving $wlgo the type
> 
> [Int] -> (##)
> 
> (unboxed 0-tuple). But it wasn't clever enough to directly return (#
> #) in the case of an empty list - that would've allowed the core to be
> 
>  \ (w_sms :: [GHC.Types.Int]) ->
>     case w_sms of _ {
>       [] -> GHC.Prim.(##)
>       : x_adq xs_adr ->
>         case x_adq of _ { GHC.Types.I# _ ->
>           Main.$wlgo xs_adr }
>     }
> 
> and all would've been fine and dandy. So it chose [] -> GHC.Unit.()
> and that forced the second branch to also have that type, hence you
> can't have a tail call there but have to box the result of the
> recursive call (only to unbox it again).
> So you get superfluous boxing, unboxing, reboxing in addition to a
> stack- eating recursion.
> 
> But you've hit a rare case here, if you use foldl'2 (flip seq) at a
> type with more than one non-bottom value, the first argument of lgo
> is not eliminated and you don't get the boxing-unboxing dance,
> instead you get a nice tail-recursion, even if foldl'2 is used only
> once and not exported.
> 
> Why GHC does that for (), beats me.
> 
> > I don't see any let expressions in the
> > folding code, so I assume no thunks are being created.  I can make a
> > tail call appear by doing either of two things:
> >
> > 1. Replace "lgo z []" with "lgo !z []".  This suggestion came from
> > an email on haskell-beginners that I can't find right now.  It was
> > a few months ago.
> 
> Perhaps
> http://www.haskell.org/pipermail/beginners/2010-August/005016.html ?

Yes, that was the email.  It was more recent than I thought.

> > 2. Instead of using the __GLASGOW_HASKELL__ version of foldl', use
> > the other version:
> >
> > foldl' f a []     = a
> > foldl' f a (x:xs) = let a' = f a x in a' `seq` foldl' f a' xs
> 
> But that needs to pass also the function, which is generally slower
> than having it as a static argument.
> 
> 3. {-# NOINLINE foldl'2 #-}
> 
> But that's not so good for performance in general.
> 
> Option 1 gives the best Core, but it changes behaviour, with that,
> 
> foldl' (\_ _ -> 1) undefined [0] = 1
> foldl'2 (\_ _ -> 1) undefined [0] = _|_
> 
> To retain the behaviour of foldl',
> 
> foldl'2 f z0 xs0 =
>     case xs0 of
>       [] -> z0
>       (x:xs) -> lgo (f z0 x) xs
>   where
>     lgo !z [] = z
>     lgo z (y:ys) = lgo (f z y) ys

I think I see why foldl'2 and foldl' have the same behavior.

> > My test case is contrived.  Originally, I had a program that read
> > lines from a file as Data.ByteString values, and I was trying to
> > debug a stack overflow.  I added the foldl' call to force the
> > evaluation of the ByteString lines, but the foldl' call itself
> > overflowed the stack.
> 
> That's probably a different matter, foldl' evaluates the accumulation 
> parameter only to weak head normal form, if it's not of simple type,
> it can still contain thunks that will overflow the stack when
> demanded.

I'm using Data.ByteString.ByteString:

Prelude> :info Data.ByteString.ByteString
data Data.ByteString.Internal.ByteString
  = Data.ByteString.Internal.PS !(GHC.ForeignPtr.ForeignPtr
                                    GHC.Word.Word8)
                                !Int
                                !Int

If I evaluate a value of this type into WHNF, I think the fields are
also evaluated into WHNF due to the strictness annotations.  That
should remove thunks from the Int arguments.  I wonder if the
ForeignPtr field could still have a thunk.

I suppose I really wanted an NFData instance for ByteString so I could
use rnf.

I looked at the NFData instance for a list. It's just:

instance NFData a => NFData [a] where
    rnf [] = ()
    rnf (x:xs) = rnf x `seq` rnf xs

If I just want to evaluate each list element to WHNF, I think I could
write:

forceList [] = ()
forceList (x:xs) = x `seq` forceList xs

This function is simpler than foldl'2, and it turns into a
tail-recursive function in Core.  Maybe I could also use 
"forceList = foldr seq ()".

> > I might have fixed my original stack overflow problem.  I was
> > applying sum to a large list of integers, and sum is lazy.
> 
> For [Int] and [Integer], sum is specialised, so when compiled with 
> optimisations, ghc should use a strict version for those types.

I retested my original program that used sum.  It overflows the stack
with -O0, but it doesn't overflow with -O or "-O -fno-strictness". With
-v4 -ddump-simpl-stats -O, I see

1 RuleFired
    1 SPEC Data.List.sum

I don't see this output with -O0 instead of -O.

-Ryan


More information about the Beginners mailing list