Looing for advice on profiling

Simon Marlow simonmar at microsoft.com
Tue Nov 9 09:45:58 EST 2004

On 09 November 2004 12:54, Duncan Coutts wrote:

> When I do time profiling, the big cost centres come up as putByte and
> putWord. When I profile for space it shows the large FiniteMaps
> dominating most everything else. I originally guessed from that that
> the serialisation must be forcing loads of thunks which is why it
> shows up so highly on the profile. However even after doing the
> deepSeq before serialisation, it takes a great deal of time, so I'm
> not sure what's going on.

let's get the simple things out of the way first: make sure you're
compiling Binary with -O -funbox-strict-fields (very important).  When
compiling for profiling, don't compile Binary with -auto-all, because
that will add cost centres to all the small functions and really skew
the profile.  I find this is a good rule of thumb when profiling: avoid
-auto-all on your low-level libraries that you hope to be inlined a lot.

You say your instances are created using DrIFT - I don't think we ever
modified DrIFT to generate the right kind of instances for the Binary
library in GHC, so are you using the instances designed for the nhc98
binary library?  If so, make sure your instances are using put_ rather
than put, because the former will allow binary output to run in constant
stack space.

Are you using BinMem, or BinIO?

> The retainer profiling again shows that the FiniteMaps are holding on
> to most stuff.
> A major problem no doubt is space use. For the large gtk/gtk.h, when I
> run with +RTS -B to get a beep every major garbage collection, the
> serialisation phase beeps continuously while the file grows.
> Occasionally it seems to freeze for 10s of seconds, not dong any
> garbage collection and not doing any file output but using 100% CPU,
> then it carries on outputting and garbage collecting furiously. I
> don't know how to work out what's going on when it does that.

I agree with Malcolm's conjecture: it sounds like a very long major GC

> I don't understand how it can be generating so much garbage when it is
> doing the serialisation stuff on a structure that has already been
> fully deepSeq'ed.

Yes, binary output *should* do zero allocation, and binary input should
only allocate the structure being created.  The Binary library is quite
heavily tuned so that this is the case (if you compile with profiling
and -auto-all, it will almost certainly break this property, though).


More information about the Glasgow-haskell-users mailing list