heap profiling

Fri Jun 18 03:29:55 EDT 2010

>> locking up.  Has anyone else seen this?  Any tips on how to
>> troubleshoot where it's getting stuck, doing what?  If it sounds like
>> a ghc bug I can try to trim down the size and submit a ticket.  GHC
>> 6.12.1 on OSX.
>
> Please submit a ticket, and try 6.12.3 if you can (we did fix some deadlock
> bugs in 6.12.2 and 6.12.3).

Ok, looks like it's still happening, so I suppose a ticket will be in
order.  I'm going to try to cut it down a little first though.  At the
least it's 100% reproduceable.  I can try to get a stack trace with
gdb or the OS X process tracing feature.

In once instance I got a crash with no message  (not even segfault or
sigabrt).  OS X traceback says:

Thread 2 Crashed:
0   elaforge.seq.seq                    0x02a2d74d LDV_recordDead + 301

The profiling files (.prof, .hp, etc.) are all empty.

>> Does this mean that 'st' will be dragging through 'x2' as it would in
>> an imperative language?
>
> Probably not.  It depends on what x1 does with st of course, but assuming
> when x1 returns all references to st have been dropped, then st is no longer
> reachable and will not be retained by the GC.

I suppose static analysis can figure out pretty easily that 'st' is
not referenced even though it's in scope, so the closure doesn't count
as a retainer.

> The original paper about this should help:
>
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.1219

Thanks, just read it now.  I still don't have an intuitive feel for
lag and drag, but sometimes I need to read something a few times for
it to sink in.

>> Lag I'm not so sure about.  How is something created before it's used?
>
> oh, that happens a lot.  e.g. in  f (g x), if f doesn't demand the value of
> its argument for a long time, then the heap closure for (g x) is lagging.
>  If the value of (g x) is never demanded, but is nevertheless stored in some
> data structure for a while, then it is VOID.

Ok... so a (g x) closure is constructed, and it will keep 'x' alive
when it shouldn't, but what's the actual data constructor here?  If
'x' is [1, 2], will these two cons cells be counted as being in LAG
stage?  Or is it just the closure, which will show up as a
sat_something?

> The Monad instance for Writer looks like this:
>
> instance (Monoid w) => Monad (Writer w) where
>    return a = Writer (a, mempty)
>    m >>= k  = Writer $ case runWriter m of
>                            (a, w) -> case runWriter (k a) of
>                                (b, w') -> (b, w `mappend` w')
>
> I expect sat_s0a1 is the closure for (w `mappend` w').  If that is causing
> your space leak, then maybe you need a Control.Monad.Writer.Stricter in
> which the written value is forced strictly by >>=.

So I copied Writer/Strict.hs and made a new one with a slightly
stricter >>=, note the ! on w':

    m >>= k  = WriterT $ do
        (a, w)  <- runWriterT m
        (b, !w') <- runWriterT (k a)
        let ws = w `mappend` w'
        return (b, ws)

Initially this had no effect.  I put a ! on the first 'w', and now the
StricterWriter.sat_* closure is replaced by the #2 consumer, which is
PAP.  What *is* PAP?  Memory usage is exactly the same, I just have
more PAP now.  So I tried switching from DList and DList.append to []
and (++) and now... the lag is gone!  Only the ! on w' is needed.

The quadratic behaviour of (++) is not a practical worry in my case,
since I log rarely so most of the appends are with []s on one side or
both, but this is a bit troubling.  Writer + DList was specifically
recommended as an efficient way to log things, but if a giant space
leak is the result, then it's not such a great idea after all.  I'm
guessing it has something to do with how DList is implemented in terms
of function compositions, but since I can't see inside PAP I have to
guess.

'xs ++ []', when forced, can be immediately reduced to 'xs'.  The bang
forces the right hand side to either [] or (_ : _) which is enough for
(++) to reduce out of existence... I suppose?  Meanwhile, 'append' for
DList is (xs . ys) which becomes ((xs++) . ([]++)), and forcing
that... well I suppose it has to get to a constructor to pattern match
on, so that should force the ([]++) out of existence since it reduces
to the same code as in the plain list part... or does it?

To try to get a better picture of what was going on, I created a
simpler profile with a simplified monad stack: just logging and state
and doing some stylized actions... and now the results are the
reverse!  The stricter version on lists produces an enormous amount of
lag, the stock Writer.Strict is much better, but still produces quite
a bit of garbage (perhaps from the State, though I can't see what I'm
doing wrong there).  The dlist + normal strictness version is so fast
the gc can't even measure it!  But if I bump the 'depth' from 9 to 10
(which means factorially more calls), suddenly I'm getting stack
overflows... but initially only in hb mode.

So the more I look at it, the less I understand.  Clearly something is
different from my production profile and the test profile, but it
seems baffling that it makes such a difference as to reverse the
results.  I will try to reduce the production version element by
element until the results switch around.

I put the simple version at
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=26329#a26329

This one displays much better performance with DList + Writer.Strict
than List + StrictWriter so I guess it's not too surprising.  However,
*something* is still generating a lot of lag and -hb causes a stack
trace, so I'm still doing it wrong somehow.

I also have a situation where -hb shows about 4mb of drag at the most,
but when I run with '-hc -hbdrag', I only see 10k peaks.  Shouldn't
filtering the graph by drag add up to as much drag as when the graph
isn't filtered?

> You see "*" when the type of a closure is polymorphic (as it would be in the
> case of w `mappend` w').  stg_ap_2_upd_info is a generic thunk
> implementation in the RTS - perhaps when profiling we should avoid using
> these to give you more information.

Well, it's eating a lot of space... so it must be something.  I guess
if it's impossible to tell what it is, then it's just being needlessly
confusing there.

And thanks so much for the responses... space usage gets more
confusing the more I look at it, but I hope it will come clear
eventually given enough time.