heap profiling

Fri Jun 18 04:10:49 EDT 2010

On 18/06/2010 08:29, Evan Laforge wrote:

>> oh, that happens a lot.  e.g. in  f (g x), if f doesn't demand the value of
>> its argument for a long time, then the heap closure for (g x) is lagging.
>>   If the value of (g x) is never demanded, but is nevertheless stored in some
>> data structure for a while, then it is VOID.
>
> Ok... so a (g x) closure is constructed, and it will keep 'x' alive
> when it shouldn't, but what's the actual data constructor here?  If
> 'x' is [1, 2], will these two cons cells be counted as being in LAG
> stage?  Or is it just the closure, which will show up as a
> sat_something?

I think it depends whether the [1,2] has been "used" or not.  If it has 
been used, then the closures are in the USE state until the last use, 
otherwise they are in the LAG state.

>> The Monad instance for Writer looks like this:
>>
>> instance (Monoid w) =>  Monad (Writer w) where
>>     return a = Writer (a, mempty)
>>     m>>= k  = Writer $ case runWriter m of
>>                             (a, w) ->  case runWriter (k a) of
>>                                 (b, w') ->  (b, w `mappend` w')
>>
>> I expect sat_s0a1 is the closure for (w `mappend` w').  If that is causing
>> your space leak, then maybe you need a Control.Monad.Writer.Stricter in
>> which the written value is forced strictly by>>=.
>
> So I copied Writer/Strict.hs and made a new one with a slightly
> stricter>>=, note the ! on w':
>
>      m>>= k  = WriterT $ do
>          (a, w)<- runWriterT m
>          (b, !w')<- runWriterT (k a)
>          let ws = w `mappend` w'
>          return (b, ws)
>
> Initially this had no effect.  I put a ! on the first 'w', and now the
> StricterWriter.sat_* closure is replaced by the #2 consumer, which is
> PAP.  What *is* PAP?  Memory usage is exactly the same, I just have
> more PAP now.  So I tried switching from DList and DList.append to []
> and (++) and now... the lag is gone!  Only the ! on w' is needed.
>
> The quadratic behaviour of (++) is not a practical worry in my case,
> since I log rarely so most of the appends are with []s on one side or
> both, but this is a bit troubling.  Writer + DList was specifically
> recommended as an efficient way to log things, but if a giant space
> leak is the result, then it's not such a great idea after all.  I'm
> guessing it has something to do with how DList is implemented in terms
> of function compositions, but since I can't see inside PAP I have to
> guess.

Right, I wouldn't use DList for this.  Here's an alternative I use:

data AList a = ANil | ASing a | Append (AList a) (AList a)

lenA :: AList a -> Int
lenA ANil          = 0
lenA (ASing _)     = 1
lenA (Append l r)  = lenA l + lenA r

appendA ANil r = r
appendA l ANil = l
appendA l r    = Append l r

Note how appendA is strict(ish) and eliminates ANils, so in a writer 
monad it shouldn't build up a space leak.  I'm sure you can write toList 
(don't use (++) though).

I'd put the bang on ws:

       m>>= k  = WriterT $ do
           (a, w)  <- runWriterT m
           (b, w') <- runWriterT (k a)
           let !ws = w `mappend` w'
           return (b, ws)

The problem with this monad is that >>= isn't tail-recursive, so it will 
cause stack overflows on recursive monadic functions.  I suspect that a 
better alternative to the strict writer monad is the strict state monad 
in most cases, because its bind is tail-recursive.

> To try to get a better picture of what was going on, I created a
> simpler profile with a simplified monad stack: just logging and state
> and doing some stylized actions... and now the results are the
> reverse!  The stricter version on lists produces an enormous amount of
> lag, the stock Writer.Strict is much better, but still produces quite
> a bit of garbage (perhaps from the State, though I can't see what I'm
> doing wrong there).  The dlist + normal strictness version is so fast
> the gc can't even measure it!  But if I bump the 'depth' from 9 to 10
> (which means factorially more calls), suddenly I'm getting stack
> overflows... but initially only in hb mode.
>
> So the more I look at it, the less I understand.  Clearly something is
> different from my production profile and the test profile, but it
> seems baffling that it makes such a difference as to reverse the
> results.  I will try to reduce the production version element by
> element until the results switch around.
>
> I put the simple version at
> http://hpaste.org/fastcgi/hpaste.fcgi/view?id=26329#a26329
>
> This one displays much better performance with DList + Writer.Strict
> than List + StrictWriter so I guess it's not too surprising.  However,
> *something* is still generating a lot of lag and -hb causes a stack
> trace, so I'm still doing it wrong somehow.
>
> I also have a situation where -hb shows about 4mb of drag at the most,
> but when I run with '-hc -hbdrag', I only see 10k peaks.  Shouldn't
> filtering the graph by drag add up to as much drag as when the graph
> isn't filtered?

That sounds suspicious.  If you can make a self-contained example that 
demonstrates it and create a ticket, that would be a great help.

Cheers,
	Simon

>> You see "*" when the type of a closure is polymorphic (as it would be in the
>> case of w `mappend` w').  stg_ap_2_upd_info is a generic thunk
>> implementation in the RTS - perhaps when profiling we should avoid using
>> these to give you more information.
>
> Well, it's eating a lot of space... so it must be something.  I guess
> if it's impossible to tell what it is, then it's just being needlessly
> confusing there.
>
>
>
> And thanks so much for the responses... space usage gets more
> confusing the more I look at it, but I hope it will come clear
> eventually given enough time.