GC trouble [was Re: Binary IO]
jmaessen at alum.mit.edu
Sat Apr 23 15:12:12 EDT 2005
On Apr 23, 2005, at 2:54 PM, Duncan Coutts wrote:
> On Sat, 2005-04-23 at 14:10 -0400, Jan-Willem Maessen wrote:
>> So I wouldn't worry about having your huge binary objects walked by
>> garbage collector. Whatever GC may do to a heap chock-full of tiny
>> objects, a single large pointer-free object should be left alone.
> Sadly the case I had in mind is exactly the former, of large syntax
> trees and large symbol tables. About 400Mb of seldom accessed mostly
> read-only and yet unpagable data.
Ah. Now that's another kettle of fish entirely... However,
generational GC *ought* to help here. If you're using GHC, I assume
you've turned on compacting GC to avoid doubling your memory, and have
set an appropriate upper bound on the heap size.
> Then to makes things worse we've got some nasty little piece of code
> which walks the AST and for some inexplicable reason generates vast
> amounts of garbage. To make things work on normal machines we have to
> set the heap limit as low as possible and so the garbage collector has
> to run very frequently reclaiming very little each time and yet it has
> to touch all of the rest of the 400Mb dataset which prevents it being
> paged out. My tests indicate that 3/4 of the running time is spent
> GC. </grumble> :-)
Hmm; this sounds like a lot of full-heap collections, which is exactly
what generational GC is trying to avoid. A very large old generation
(like, say, 500+Mb) might help a lot in this instance; I have no idea
how GHC decides generation sizes. It might also help to set a very
large allocation area to reduce promotion rate to the second
generation, and give the gobs of transient data some time to die---or,
similarly, to increase the number of generations to increase the time
it takes things to get to the old generation. Fundamentally, though,
when you run really close to your memory limits GC tends to be unhappy.
More information about the Libraries