Removing latency spikes. Garbage collector related?
Neil Davies
semanticphilosopher at gmail.com
Tue Sep 29 10:45:23 UTC 2015
Will
I was trying to get a feeling for what those coloured squares actually
denoted - typically we examine this sort of performance information
as CDFs (cumulative distribution functions[1]) trying to pull apart the
issues that “mean” effecting (i.e typical path through code/system) and
those that are “tail” effecting (i.e exceptions - and GC running could
be seen as an “exception” - one that you can manage and time shift
in the relative timing).
I’m assuming that messages have a similar “cost” (i.e similar work
to complete) - so that a uniform arrival rate equates to a uniform
rate of work to be done arriving.
Neil
[1] We plot the CDF’s in two ways, the “usual” way for the major part
of the probability mass and then as a (1-CDF) on a log log scale to
expose the tail behaviour.
On 29 Sep 2015, at 10:35, Will Sewell <me at willsewell.com> wrote:
> Thank you for the reply Neil.
>
> The spikes are in response time. The graph I linked to shows the
> distribution of response times in a given window of time (darkness of
> the square is the number of messages in a particular window of
> response time). So the spikes are in the mean and also the max
> response time. Having said that I'm not exactly sure what you mean by
> "mean values".
>
> I will have a look into -I0.
>
> Yes the arrival of messages is constant. This graph shows the number
> of messages that have been published to the system:
> http://i.imgur.com/ADzMPIp.png
>
> On 29 September 2015 at 10:16, Neil Davies
> <semanticphilosopher at gmail.com> wrote:
>> Will
>>
>> is your issue with the spikes i response time, rather than the mean values?
>>
>> If so, once you’ve reduced the amount of unnecessary mutation, you might want
>> to take more control over when the GC is taking place. You might want to disable
>> GC on timer (-I0) and force GC to occur at points you select - we found this useful.
>>
>> Lastly, is the arrival pattern (and distribution pattern) of messages constant or
>> variable? just making sure that you are not trying to fight basic queueing theory here.
>>
>>
>> Neil
>>
>> On 29 Sep 2015, at 10:03, Will Sewell <me at willsewell.com> wrote:
>>
>>> Thanks for the reply Greg. I have already tried tweaking these values
>>> a bit, and this is what I found:
>>>
>>> * I first tried -A256k because the L2 cache is that size (Simon Marlow
>>> mentioned this can lead to good performance
>>> http://stackoverflow.com/a/3172704/1018290)
>>> * I then tried a value of -A2048k because he also said "using a very
>>> large young generation size might outweigh the cache benefits". I
>>> don't exactly know what he meant by "a very large young generation
>>> size", so I guessed at this value. Is it in the right ballpark?
>>> * With -H, I tried values of -H8m, -H32m, -H128m, -H512m, -H1024m
>>>
>>> But all lead to worse performance over the defaults (and -H didn't
>>> really have much affect at all).
>>>
>>> I will try your suggestion of setting -A to the L3 cache size.
>>>
>>> Are there any other values I should try setting these at?
>>>
>>> As for your final point, I have run space profiling, and it looks like
>>>> 90% of the memory is used for our message index, which is a temporary
>>> store of messages that have gone through the system. These messages
>>> are stored in aligned chunks in memory that are merged together. I
>>> initially though this was causing the spikes, but they were still
>>> there even after I removed the component. I will try and run space
>>> profiling in the build with the message index.
>>>
>>> Thanks again.
>>>
>>> On 28 September 2015 at 19:02, Gregory Collins <greg at gregorycollins.net> wrote:
>>>>
>>>> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <me at willsewell.com> wrote:
>>>>>
>>>>> If it is the GC, then is there anything that can be done about it?
>>>>
>>>> Increase value of -A (the default is too small) -- best value for this is L3
>>>> cache size of the chip
>>>> Increase value of -H (total heap size) -- this will use more ram but you'll
>>>> run GC less often
>>>> This will sound flip, but: generate less garbage. Frequency of GC runs is
>>>> proportional to the amount of garbage being produced, so if you can lower
>>>> mutator allocation rate then you will also increase net productivity.
>>>> Built-up thunks can transparently hide a lot of allocation so fire up the
>>>> profiler and tighten those up (there's an 80-20 rule here). Reuse output
>>>> buffers if you aren't already, etc.
>>>>
>>>> G
>>>>
>>>> --
>>>> Gregory Collins <greg at gregorycollins.net>
>>> _______________________________________________
>>> Glasgow-haskell-users mailing list
>>> Glasgow-haskell-users at haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
>>
More information about the Glasgow-haskell-users
mailing list