using nmitchell's space leak detection technique

Evan Laforge qdunkan at gmail.com
Fri Oct 2 02:02:08 UTC 2015


Neil Mitchell wrote an article about finding space leaks by limiting
the stack size:

http://neilmitchell.blogspot.fr/2015/09/detecting-space-leaks.html

I'm giving it a try, but the results don't make any sense to me.  My
understanding is that the too-deep stack indicates that someone
created too many thunks, so when they get evaluated you have to
descend too many times.  And if the stack trace seems to be a
reasonable size (say 100 stack frames somehow consuming 1mb of stack)
then it means there is a recursive call in there somewhere that ghc is
helpfully eliding.  And lastly, that the thing about "Stack space
overflow: current size 33568 bytes." always showing 33568 bytes is due
to a ghc bug and it should actually be whatever limit I gave it.  Is
all this accurate?

The stack trace jumps around a lot, to places which are not lexically
present in the caller.  I assume this is just lazy evaluation, so e.g.
maybe 'f' doesn't call 'g', but if 'f' forces a value returned from
'g' which has not yet been forced, the stack will go to 'g'.  Also
sometimes things which seem like they should be run consecutively wind
up as caller and callee... but perhaps this is due to using a monad
with a success continuation, and without the SCCs would be eliminated
since they are tail calls... or maybe they are always eliminated and I
only see them in the stack when the elimination failed for some
reason.

At first I got a stack trace that ended in a function which is
recursive, but should be productive guarded recursion:

events_of :: [LEvent d] -> [d]
events_of [] = []
events_of (Event e : rest) = e : events_of rest
events_of (Log _ : rest) = events_of rest

It's my understanding that while this isn't tail recursive, it should
require only constant space.  Of course the consumer could retain it
all, but that wouldn't show up as too-deep recursion on that
particular function, would it?

Since ghc elides self-calls from the stack, the culprit could be one
of the functions above, and the one immediately above was part of
transforming a large list which could well be insufficiently strict
somewhere.

So I tried to break that out into its own call without the distraction
of the continuation monad etc., just a function from a list to a list.
The result is even more confusing:

% build/profile/RunProfile-SpaceLeak +RTS -K2905K -xc -RTS
...
force - *** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace:
  Derive.PSignal.signal,
  called from Derive.DeriveTest.mkevent_scale,
  called from Derive.DeriveTest.mkevent,
  called from Derive.SpaceLeak_profile.profile_cancel.make,
  called from Derive.SpaceLeak_profile.profile_cancel.f,
  called from Derive.SpaceLeak_profile.profile_cancel,
  called from Derive.SpaceLeak_profile.CAF

Somehow these 7 calls are consuming almost 3mb of stack, so there must
be a ton of elided recursion hiding in one of them.  The confusing
thing is that the profile_cancel function is basically just:

rnf $ suspicious_function $ map make [0 .. 1024*50]

So the stack is going into the 'make' function which is just a bunch
of constructors to produce the values that suspicious_function will
transform.  As far as I can tell, the only recursion is in 'map', and
the thunks from 'mkevent' down to 'PSignal.signal' only go 3 deep.
But suspicious_function is definitely suspicious, because if I remove
it, I no longer get an exception.  But how does it manage to not be in
the stack?

If suspicious_function is not a good consumer and is accumulating
thunks, how can I get 3 stack entries consuming 3mb?  If I rnf the
result of 'make' before running suspicious_function, I get a
completely different stack trace, and this one with '--> evaluated
by:' entries, and I don't really know what those mean.  But this one
at least has some recursive calls within suspicious_function.

I can keep debugging via trial and error and successive rnf's as Neil
describes in another article, but it would be nice to know what's
going on with the stack.  The stack seems fairly straightforward in
strict languages, is there some documentation on how to understand the
ghc version?

Any clues appreciated!


More information about the Glasgow-haskell-users mailing list