[Haskell-cafe] Real-time garbage collection for Haskell

Curt Sampson cjs at starling-software.com
Thu Mar 4 04:14:13 EST 2010


On 2010-03-02 11:33 +0900 (Tue), Simon Cranshaw wrote:

> I can confirm that without tweaking the RTS settings we were seeing
> over 100ms GC pauses.

Actually, we can't quite confirm that yet. We're seeing large amounts
of time go by in our main trading loop, but I'm still building the
profiling tools to see what exactly is going on there. However, GC is
high on our list of suspects, since twiddling the GC parameters can
improve things drastically.

On 2010-03-02 00:06 -0500 (Tue), John Van Enk wrote:

> Would a more predictable GC or a faster GC be better in your case? (Of
> course, both would be nice.)

Well, as on 2010-03-01 17:18 -0600 (Mon), Jeremy Shaw wrote:

> For audio apps, there is a callback that happens every few milliseconds. As
> often as 4ms. The callback needs to finish as quickly as possible to avoid a
> buffer underruns.

I think we're in about the same position. Ideally we'd never have to
stop for GC, but that's obviously not practical; what will hurt pretty
badly, and we should be able to prevent, is us gathering up a bunch
of market data, making a huge pause for a big GC, and then generating
orders based on that now oldish market data. We'd be far better off
doing the GC first, and then looking at the state of the market and
doing our thing, because though the orders will still not get out as
quickly as they would without the GC, at least they'll be using more
recent data.

I tried invoking System.Mem.performGC at the start of every loop, but
that didn't help. Now that I know it was invoking a major GC, I can see
why. :-) But really, before I go much further with this:

On 2010-03-01 14:41 +0100 (Mon), Peter Verswyvelen wrote:

> Sounds like we need to come up with some benchmarking programs so we
> can measure the GC latency and soft-realtimeness...

Exactly. Right now I'm working from logs made by my own logging and
profiling system. These are timestamped, and they're "good enough"
to get a sense of what's going on, but incomplete. I also have the
information from the new event logging system, which is excellent in
terms of knowing exactly when things are starting and stopping, but
doesn't include information about my program, nor does it include any
sort of GC stats. Then we have the GC statistics we get with -S, which
don't have timestamps.

My plan is to bring all of this together. The first step was to fix
GHC.Exts.traceEvent so that we can use that to report information about
what the application is doing. In 6.12.1 it segfaults, but we have a fix
(see http://hackage.haskell.org/trac/ghc/ticket/3874) and it looks as if
it will go into 6.12.2, even. The next step is to start recording the
information generated by -S in the eventlog as well, so that not only
do we know when a GC started or stopped in relation to our application
code, but we know what generation it was, how big the heap was at the
time, how much was collected, and so on and so forth. Someone mentioned
that there were various other stats that were collected but not printed
by -S; we should probably throw those in too.

With all of that information it should be much easier to figure
out where and when GC behaviour is causing us pain in low-latency
applications.

However: now that Simon's spent a bunch of time experimenting with the
runtime's GC settings and found a set that's mitigated much of our
problem, other things are pushing their way up my priority list. Between
that and an upcoming holiday, I'm probably not going to get back to this
for a few weeks. But I'd be happy to discuss my ideas with anybody else
who's interested in similar things, even if just to know what would be
useful to others.

What do you guys think about setting up a separate mailing list for
this? I have to admit, I don't follow haskell-cafe much due to the high
volume of the list. (Thus my late presence in this thread.) I would be
willing to keep much closer track of a low-volume list that dealt with
only GC stuff.

I'd even be open to providing hosting for the list, using my little baby
mailing list manager written in Haskell (mhailist). It's primitive, but
it does handle subscribing, unsubscribing and forwarding of messages.

cjs
-- 
Curt Sampson         <cjs at cynic.net>         +81 90 7737 2974
             http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it.    --George Bernard Shaw


More information about the Haskell-Cafe mailing list