Removing latency spikes. Garbage collector related?

Tue Sep 29 12:37:33 UTC 2015

That's interesting. I have not done this kind of work before, and had
not come across CDFs. I can see why it make sense to look at the mean
and tail.

Your assumption is correct. The messages have a similar cost, which is
why the graph I posted is relatively flat most of the time. The spikes
suggest to me that it is a tail affecting issue because the messages
are following the same code path as when it is running normally.

On 29 September 2015 at 11:45, Neil Davies
<semanticphilosopher at gmail.com> wrote:
> Will
>
> I was trying to get a feeling for what those coloured squares actually
> denoted - typically we examine this sort of performance information
> as CDFs (cumulative distribution functions[1]) trying to pull apart the
> issues that “mean” effecting (i.e typical path through code/system) and
> those that are “tail” effecting (i.e exceptions - and GC running could
> be seen as an “exception” - one that you can manage and time shift
> in the relative timing).
>
> I’m assuming that messages have a similar “cost” (i.e similar work
> to complete) - so that a uniform arrival rate equates to a uniform
> rate of work to be done arriving.
>
> Neil
> [1] We plot the CDF’s in two ways, the “usual” way for the major part
> of the probability mass and then as a (1-CDF) on a log log scale to
> expose the tail behaviour.
>
> On 29 Sep 2015, at 10:35, Will Sewell <me at willsewell.com> wrote:
>
>> Thank you for the reply Neil.
>>
>> The spikes are in response time. The graph I linked to shows the
>> distribution of response times in a given window of time (darkness of
>> the square is the number of messages in a particular window of
>> response time). So the spikes are in the mean and also the max
>> response time. Having said that I'm not exactly sure what you mean by
>> "mean values".
>>
>> I will have a look into -I0.
>>
>> Yes the arrival of messages is constant. This graph shows the number
>> of messages that have been published to the system:
>> http://i.imgur.com/ADzMPIp.png
>>
>> On 29 September 2015 at 10:16, Neil Davies
>> <semanticphilosopher at gmail.com> wrote:
>>> Will
>>>
>>> is your issue with the spikes i response time, rather than the mean values?
>>>
>>> If so, once you’ve reduced the amount of unnecessary mutation, you might want
>>> to take more control over when the GC is taking place. You might want to disable
>>> GC on timer (-I0) and force GC to occur at points you select - we found this useful.
>>>
>>> Lastly, is the arrival pattern (and distribution pattern) of messages constant or
>>> variable? just making sure that you are not trying to fight basic queueing theory here.
>>>
>>>
>>> Neil
>>>
>>> On 29 Sep 2015, at 10:03, Will Sewell <me at willsewell.com> wrote:
>>>
>>>> Thanks for the reply Greg. I have already tried tweaking these values
>>>> a bit, and this is what I found:
>>>>
>>>> * I first tried -A256k because the L2 cache is that size (Simon Marlow
>>>> mentioned this can lead to good performance
>>>> http://stackoverflow.com/a/3172704/1018290)
>>>> * I then tried a value of -A2048k because he also said "using a very
>>>> large young generation size might outweigh the cache benefits". I
>>>> don't exactly know what he meant by "a very large young generation
>>>> size", so I guessed at this value. Is it in the right ballpark?
>>>> * With -H, I tried values of -H8m, -H32m, -H128m, -H512m, -H1024m
>>>>
>>>> But all lead to worse performance over the defaults (and -H didn't
>>>> really have much affect at all).
>>>>
>>>> I will try your suggestion of setting -A to the L3 cache size.
>>>>
>>>> Are there any other values I should try setting these at?
>>>>
>>>> As for your final point, I have run space profiling, and it looks like
>>>>> 90% of the memory is used for our message index, which is a temporary
>>>> store of messages that have gone through the system. These messages
>>>> are stored in aligned chunks in memory that are merged together. I
>>>> initially though this was causing the spikes, but they were still
>>>> there even after I removed the component. I will try and run space
>>>> profiling in the build with the message index.
>>>>
>>>> Thanks again.
>>>>
>>>> On 28 September 2015 at 19:02, Gregory Collins <greg at gregorycollins.net> wrote:
>>>>>
>>>>> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <me at willsewell.com> wrote:
>>>>>>
>>>>>> If it is the GC, then is there anything that can be done about it?
>>>>>
>>>>> Increase value of -A (the default is too small) -- best value for this is L3
>>>>> cache size of the chip
>>>>> Increase value of -H (total heap size) -- this will use more ram but you'll
>>>>> run GC less often
>>>>> This will sound flip, but: generate less garbage. Frequency of GC runs is
>>>>> proportional to the amount of garbage being produced, so if you can lower
>>>>> mutator allocation rate then you will also increase net productivity.
>>>>> Built-up thunks can transparently hide a lot of allocation so fire up the
>>>>> profiler and tighten those up (there's an 80-20 rule here). Reuse output
>>>>> buffers if you aren't already, etc.
>>>>>
>>>>> G
>>>>>
>>>>> --
>>>>> Gregory Collins <greg at gregorycollins.net>
>>>> _______________________________________________
>>>> Glasgow-haskell-users mailing list
>>>> Glasgow-haskell-users at haskell.org
>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
>>>
>