[Haskell-cafe] Re: Real-time garbage collection for Haskell

Tue Mar 2 08:50:31 EST 2010

On 01/03/2010 00:04, Luke Palmer wrote:
> On Sun, Feb 28, 2010 at 2:06 AM, Pavel Perikov<perikov at gmail.com>  wrote:
>> Did you really seen 100ms pauses?! I never did extensive research on this but my numbers are rather in microseconds range (below 1ms). What causes such a long garbage collection? Lots of allocated and long-living objects?
>
> This is all hypothetical right now.  I heard some horror stories in
> which people had to switch to the main game loop in C++ and only do
> the AI logic in Haskell because of pauses.  I would rather not do
> that, especially because this project is *about* proving Haskell as a
> viable game development platform.  So I am trying to be prepared if I
> do see something like that, so that it doesn't put the show on hold
> for a few months.
>
> Presumably large, long-living objects would cause the generation 0
> collections to take a long time.  I am not sure if we will have any
> said objects, but we can't rule it out...
>
> Thanks for the positive reassurances, at least.  I'd like to hear from
> people with the opposite experience, if there are any.

By default GHC uses a generational GC with a 512KB nursery size, which 
means that most GC pauses will be very short but just occasionally you 
have to do a major GC and there's no upper bound on the pause time for 
that collection, because the whole live heap is traversed.  The pause 
time for a major collection is proportional to the amount of live data, 
so the people who are experiencing very short pauses probably have 
little live data and/or have allocation patterns that means the old 
generation is collected very infrequently.

Monolithic major GC is a big stinking scalability problem, and the only 
solutions are to do concurrent GC, incremental GC, or region-based GC.

Both concurrent GC and incremental GC tend to add overheads to the 
mutator, because they need a read barrier.  There was an incremental GC 
for GHC once [1], taking advantage of the built-in read barrier that we 
have whereby most closures are "entered", and it more-or-less worked but 
was quite complicated and had some code-size overhead.  Nowadays part of 
this built-in read barrier has been eroded by pointer tagging, which 
makes things a bit more tricky.

Region-based GC is a generalisation of generational GC, where you divide 
the heap into regions and track all pointers between regions, so a given 
collection can collect any subset of the regions.  This basic idea is 
quite popular amongst the Java people these days, Thomas mentioned the 
G1 collector which does this and uses a concurrent mark phase to 
identify which regions to collect.  Regardless of how you figure out 
which regions to collect, the basic idea of breaking up the old 
generation like this is a good way to reduce pause times.

At the moment I'm focussed on something different: making individual 
processors able to collect their own heaps independently, so removing 
the stop-the-world requirement from minor GCs.  This should help 
parallel throughput a lot, but won't fix the major GC pause times.  I am 
slightly worried that the GC can't bear the extra complexity of adding 
both processor-independent GC *and* region-based GC or some other 
pause-time-reducing technique. But we'll see.  I'll happily 
supervise/mentor anyone who wants to tackle this.

Cheers,
	Simon

[1] 
http://delivery.acm.org/10.1145/360000/351265/p257-cheadle.pdf?key1=351265&key2=8540457621&coll=GUIDE&dl=GUIDE&CFID=80115628&CFTOKEN=59704548