<p dir="ltr">By dumping metrics, I mean essentially the same as the ghc-events-analyze annotations but with any more information that is useful for the investigation.  In particular,  if you have a message id, include that. You may also want to annotate thread names with GHC.Conc.labelThread. You may also want to add more annotations to drill down if you uncover a problem area. </p>

<p dir="ltr">If I were investigating, I would take e.g. the five largest outliers, then look in the (text) eventlog for those message ids, and see what happened between the start and stop.  You'll likely want to track the thread states (which is why I suggested you annotate the thread names).</p>

<p dir="ltr">I'm not convinced it's entirely the GC, the latencies are larger than I would expect from a GC pause (although lots of factors can affect that). I suspect that either you have something causing abnormal GC spikes, or there's a different cause.<br>

</p>


<br><div class="gmail_quote"><div dir="ltr">On 04:15, Tue, Sep 29, 2015 Will Sewell <<a href="mailto:me@willsewell.com" target="_blank">me@willsewell.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks for the reply John. I will have a go at doing that. What do you<br>

mean exactly by dumping metrics, do you mean measuring the latency<br>

within the program, and dumping it if it exceeds a certain threshold?<br>

<br>

And from the answers I'm assuming you believe it is the GC that is<br>

most likely causing these spikes. I've never profiled Haskell code, so<br>

I'm not used to seeing what the effects of the GC actually are.<br>

<br>

On 28 September 2015 at 19:31, John Lato <<a href="mailto:jwlato@gmail.com" target="_blank">jwlato@gmail.com</a>> wrote:<br>

> Try Greg's recommendations first.  If you still need to do more<br>

> investigation, I'd recommend that you look at some samples with either<br>

> threadscope or dumping the eventlog to text.  I really like<br>

> ghc-events-analyze, but it doesn't provide quite the same level of detail.<br>

> You may also want to dump some of your metrics into the eventlog, because<br>

> then you'll be able to see exactly how high latency episodes line up with GC<br>

> pauses.<br>

><br>

> On Mon, Sep 28, 2015 at 1:02 PM Gregory Collins <<a href="mailto:greg@gregorycollins.net" target="_blank">greg@gregorycollins.net</a>><br>

> wrote:<br>

>><br>

>><br>

>> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <<a href="mailto:me@willsewell.com" target="_blank">me@willsewell.com</a>> wrote:<br>

>>><br>

>>> If it is the GC, then is there anything that can be done about it?<br>

>><br>

>> Increase value of -A (the default is too small) -- best value for this is<br>

>> L3 cache size of the chip<br>

>> Increase value of -H (total heap size) -- this will use more ram but<br>

>> you'll run GC less often<br>

>> This will sound flip, but: generate less garbage. Frequency of GC runs is<br>

>> proportional to the amount of garbage being produced, so if you can lower<br>

>> mutator allocation rate then you will also increase net productivity.<br>

>> Built-up thunks can transparently hide a lot of allocation so fire up the<br>

>> profiler and tighten those up (there's an 80-20 rule here). Reuse output<br>

>> buffers if you aren't already, etc.<br>

>><br>

>> G<br>

>><br>

>> --<br>

>> Gregory Collins <<a href="mailto:greg@gregorycollins.net" target="_blank">greg@gregorycollins.net</a>><br>

>> _______________________________________________<br>

>> Glasgow-haskell-users mailing list<br>

>> <a href="mailto:Glasgow-haskell-users@haskell.org" target="_blank">Glasgow-haskell-users@haskell.org</a><br>

>> <a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users" rel="noreferrer" target="_blank">http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users</a><br>

</blockquote></div>