Removing latency spikes. Garbage collector related?

Thu Oct 1 12:19:56 UTC 2015

Thanks you for the suggestions both of you.

After some more profiling, I've come to realise that most of the
garbage is created from allocations when serialising messages to write
to the socket. I am going to try and reduce this next. If that does
not help reduce the latency spikes, I will work through each of your
suggestions.

On 29 September 2015 at 16:47, John Lato <jwlato at gmail.com> wrote:
> By dumping metrics, I mean essentially the same as the ghc-events-analyze
> annotations but with any more information that is useful for the
> investigation.  In particular,  if you have a message id, include that. You
> may also want to annotate thread names with GHC.Conc.labelThread. You may
> also want to add more annotations to drill down if you uncover a problem
> area.
>
> If I were investigating, I would take e.g. the five largest outliers, then
> look in the (text) eventlog for those message ids, and see what happened
> between the start and stop.  You'll likely want to track the thread states
> (which is why I suggested you annotate the thread names).
>
> I'm not convinced it's entirely the GC, the latencies are larger than I
> would expect from a GC pause (although lots of factors can affect that). I
> suspect that either you have something causing abnormal GC spikes, or
> there's a different cause.
>
>
> On 04:15, Tue, Sep 29, 2015 Will Sewell <me at willsewell.com> wrote:
>>
>> Thanks for the reply John. I will have a go at doing that. What do you
>> mean exactly by dumping metrics, do you mean measuring the latency
>> within the program, and dumping it if it exceeds a certain threshold?
>>
>> And from the answers I'm assuming you believe it is the GC that is
>> most likely causing these spikes. I've never profiled Haskell code, so
>> I'm not used to seeing what the effects of the GC actually are.
>>
>> On 28 September 2015 at 19:31, John Lato <jwlato at gmail.com> wrote:
>> > Try Greg's recommendations first.  If you still need to do more
>> > investigation, I'd recommend that you look at some samples with either
>> > threadscope or dumping the eventlog to text.  I really like
>> > ghc-events-analyze, but it doesn't provide quite the same level of
>> > detail.
>> > You may also want to dump some of your metrics into the eventlog,
>> > because
>> > then you'll be able to see exactly how high latency episodes line up
>> > with GC
>> > pauses.
>> >
>> > On Mon, Sep 28, 2015 at 1:02 PM Gregory Collins
>> > <greg at gregorycollins.net>
>> > wrote:
>> >>
>> >>
>> >> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <me at willsewell.com> wrote:
>> >>>
>> >>> If it is the GC, then is there anything that can be done about it?
>> >>
>> >> Increase value of -A (the default is too small) -- best value for this
>> >> is
>> >> L3 cache size of the chip
>> >> Increase value of -H (total heap size) -- this will use more ram but
>> >> you'll run GC less often
>> >> This will sound flip, but: generate less garbage. Frequency of GC runs
>> >> is
>> >> proportional to the amount of garbage being produced, so if you can
>> >> lower
>> >> mutator allocation rate then you will also increase net productivity.
>> >> Built-up thunks can transparently hide a lot of allocation so fire up
>> >> the
>> >> profiler and tighten those up (there's an 80-20 rule here). Reuse
>> >> output
>> >> buffers if you aren't already, etc.
>> >>
>> >> G
>> >>
>> >> --
>> >> Gregory Collins <greg at gregorycollins.net>
>> >> _______________________________________________
>> >> Glasgow-haskell-users mailing list
>> >> Glasgow-haskell-users at haskell.org
>> >> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users