Fwd: Removing latency spikes. Garbage collector related?

Will Sewell me at willsewell.com
Tue Sep 29 09:03:58 UTC 2015


Thanks for the reply Greg. I have already tried tweaking these values
a bit, and this is what I found:

* I first tried -A256k because the L2 cache is that size (Simon Marlow
mentioned this can lead to good performance
http://stackoverflow.com/a/3172704/1018290)
* I then tried a value of -A2048k because he also said "using a very
large young generation size might outweigh the cache benefits". I
don't exactly know what he meant by "a very large young generation
size", so I guessed at this value. Is it in the right ballpark?
* With -H, I tried values of -H8m, -H32m, -H128m, -H512m, -H1024m

But all lead to worse performance over the defaults (and -H didn't
really have much affect at all).

I will try your suggestion of setting -A to the L3 cache size.

Are there any other values I should try setting these at?

As for your final point, I have run space profiling, and it looks like
>90% of the memory is used for our message index, which is a temporary
store of messages that have gone through the system. These messages
are stored in aligned chunks in memory that are merged together. I
initially though this was causing the spikes, but they were still
there even after I removed the component. I will try and run space
profiling in the build with the message index.

Thanks again.

On 28 September 2015 at 19:02, Gregory Collins <greg at gregorycollins.net> wrote:
>
> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <me at willsewell.com> wrote:
>>
>> If it is the GC, then is there anything that can be done about it?
>
> Increase value of -A (the default is too small) -- best value for this is L3
> cache size of the chip
> Increase value of -H (total heap size) -- this will use more ram but you'll
> run GC less often
> This will sound flip, but: generate less garbage. Frequency of GC runs is
> proportional to the amount of garbage being produced, so if you can lower
> mutator allocation rate then you will also increase net productivity.
> Built-up thunks can transparently hide a lot of allocation so fire up the
> profiler and tighten those up (there's an 80-20 rule here). Reuse output
> buffers if you aren't already, etc.
>
> G
>
> --
> Gregory Collins <greg at gregorycollins.net>


More information about the Glasgow-haskell-users mailing list