[Haskell-beginners] Stack space overflow with foldl'
Ryan Prichard
ryan.prichard at gmail.com
Sat Sep 11 17:30:50 EDT 2010
Daniel,
Thanks for your help.
On Fri, 10 Sep 2010 16:20:40 +0200
Daniel Fischer <daniel.is.fischer at web.de> wrote:
> On Friday 10 September 2010 11:29:56, Ryan Prichard wrote:
> > Hi,
> >
> > I see a stack overflow with this code, but I don't understand why.
>
> Neither do I, really, but ghc is being too clever for its own good -
> or not clever enough.
>
> >
> > I looked at the Core output with ghc-core, with or without
> > optimizations, and I see a $wlgo recursive function that doesn't
> > appear to end in a tail call.
>
> Without optimisations, I see a nice tail-recursive lgo inside
> foldl'2, pretty much what one would like to see.
When I ran ghc-core, I used the command-line "ghc-core -- Test.hs". I
thought it would compile Test.hs without optimizations, but ghc-core
actually has a default string of ghc arguments that includes -O2. If I
run "ghc-core -- -O0 Test.hs", I also see a nice tail-recursive lgo.
> With optimisations, you get a specialised worker $wlgo:
>
> Rec {
> Main.$wlgo :: [GHC.Types.Int] -> (##)
> GblId
> [Arity 1
> NoCafRefs
> Str: DmdType S]
> Main.$wlgo =
> \ (w_sms :: [GHC.Types.Int]) ->
> case case w_sms of _ {
> [] -> GHC.Unit.();
> : x_adq xs_adr ->
> case x_adq of _ { GHC.Types.I# _ ->
> case Main.$wlgo xs_adr of _ { (# #) -> GHC.Unit.() }
> }
> }
> of _ { () ->
> GHC.Prim.(##)
> }
> end Rec }
>
> and it's here that ghc shows the wrong amount of cleverness.
>
> What have we? At the types () and [Int], with f = flip seq, the step
> in lgo unfolds to
>
> lgo z (x:xs)
> ~> let z' = f z x in lgo z' xs
> ~> case f z x of
> z'@() -> lgo z' xs
> ~> case (case x of { I# _ -> z }) of
> z'@() -> lgo z' xs
>
> Now flip seq returns its first argument unless its second argument is
> _|_ and () has only one non-bottom value, so the ()-argument of lgo
> can be eliminated (here, the initial ()-argument is known to be (),
> in general the wrapper checks for _|_ before entering the worker),
> giving
>
> wlgo :: [Int] -> ()
> wlgo [] = ()
> wlgo (x:xs) =
> case (case x of { I# _ -> () }) of
> () -> wlgo xs
>
> It would be nice if the compiler rewrote the last equation to
>
> wlgo (x:xs) -> case x of { I# _ -> wlgo xs }
>
> , but apparently it can't. Still, that's pretty good code.
> Now comes the misplaced cleverness.
> Working with unboxed types is better (faster) than working with boxed
> types, so let's use unboxed types, giving $wlgo the type
>
> [Int] -> (##)
>
> (unboxed 0-tuple). But it wasn't clever enough to directly return (#
> #) in the case of an empty list - that would've allowed the core to be
>
> \ (w_sms :: [GHC.Types.Int]) ->
> case w_sms of _ {
> [] -> GHC.Prim.(##)
> : x_adq xs_adr ->
> case x_adq of _ { GHC.Types.I# _ ->
> Main.$wlgo xs_adr }
> }
>
> and all would've been fine and dandy. So it chose [] -> GHC.Unit.()
> and that forced the second branch to also have that type, hence you
> can't have a tail call there but have to box the result of the
> recursive call (only to unbox it again).
> So you get superfluous boxing, unboxing, reboxing in addition to a
> stack- eating recursion.
>
> But you've hit a rare case here, if you use foldl'2 (flip seq) at a
> type with more than one non-bottom value, the first argument of lgo
> is not eliminated and you don't get the boxing-unboxing dance,
> instead you get a nice tail-recursion, even if foldl'2 is used only
> once and not exported.
>
> Why GHC does that for (), beats me.
>
> > I don't see any let expressions in the
> > folding code, so I assume no thunks are being created. I can make a
> > tail call appear by doing either of two things:
> >
> > 1. Replace "lgo z []" with "lgo !z []". This suggestion came from
> > an email on haskell-beginners that I can't find right now. It was
> > a few months ago.
>
> Perhaps
> http://www.haskell.org/pipermail/beginners/2010-August/005016.html ?
Yes, that was the email. It was more recent than I thought.
> > 2. Instead of using the __GLASGOW_HASKELL__ version of foldl', use
> > the other version:
> >
> > foldl' f a [] = a
> > foldl' f a (x:xs) = let a' = f a x in a' `seq` foldl' f a' xs
>
> But that needs to pass also the function, which is generally slower
> than having it as a static argument.
>
> 3. {-# NOINLINE foldl'2 #-}
>
> But that's not so good for performance in general.
>
> Option 1 gives the best Core, but it changes behaviour, with that,
>
> foldl' (\_ _ -> 1) undefined [0] = 1
> foldl'2 (\_ _ -> 1) undefined [0] = _|_
>
> To retain the behaviour of foldl',
>
> foldl'2 f z0 xs0 =
> case xs0 of
> [] -> z0
> (x:xs) -> lgo (f z0 x) xs
> where
> lgo !z [] = z
> lgo z (y:ys) = lgo (f z y) ys
I think I see why foldl'2 and foldl' have the same behavior.
> > My test case is contrived. Originally, I had a program that read
> > lines from a file as Data.ByteString values, and I was trying to
> > debug a stack overflow. I added the foldl' call to force the
> > evaluation of the ByteString lines, but the foldl' call itself
> > overflowed the stack.
>
> That's probably a different matter, foldl' evaluates the accumulation
> parameter only to weak head normal form, if it's not of simple type,
> it can still contain thunks that will overflow the stack when
> demanded.
I'm using Data.ByteString.ByteString:
Prelude> :info Data.ByteString.ByteString
data Data.ByteString.Internal.ByteString
= Data.ByteString.Internal.PS !(GHC.ForeignPtr.ForeignPtr
GHC.Word.Word8)
!Int
!Int
If I evaluate a value of this type into WHNF, I think the fields are
also evaluated into WHNF due to the strictness annotations. That
should remove thunks from the Int arguments. I wonder if the
ForeignPtr field could still have a thunk.
I suppose I really wanted an NFData instance for ByteString so I could
use rnf.
I looked at the NFData instance for a list. It's just:
instance NFData a => NFData [a] where
rnf [] = ()
rnf (x:xs) = rnf x `seq` rnf xs
If I just want to evaluate each list element to WHNF, I think I could
write:
forceList [] = ()
forceList (x:xs) = x `seq` forceList xs
This function is simpler than foldl'2, and it turns into a
tail-recursive function in Core. Maybe I could also use
"forceList = foldr seq ()".
> > I might have fixed my original stack overflow problem. I was
> > applying sum to a large list of integers, and sum is lazy.
>
> For [Int] and [Integer], sum is specialised, so when compiled with
> optimisations, ghc should use a strict version for those types.
I retested my original program that used sum. It overflows the stack
with -O0, but it doesn't overflow with -O or "-O -fno-strictness". With
-v4 -ddump-simpl-stats -O, I see
1 RuleFired
1 SPEC Data.List.sum
I don't see this output with -O0 instead of -O.
-Ryan
More information about the Beginners
mailing list