Binary IO

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Sat Apr 23 14:54:12 EDT 2005


On Sat, 2005-04-23 at 14:10 -0400, Jan-Willem Maessen wrote:
> On Apr 22, 2005, at 5:33 PM, Duncan Coutts wrote:
> > Though arn't there some issues with the fact that regular garbage
> > colection touches most of the heap (even if it doesn't modify it) and 
> > so
> > very little of it can be paged out of physical ram.
> 
> This is a common misconception about garbage collection in general.
> 
> There are only two reasons for a garbage collector to walk through a 
> given piece of memory:
> * The memory is live, and may contain pointers; those pointers must be 
> found and traced.
> * A copying/compacting collector needs to move the data.
> 
> Most collectors keep a special large object area which contains big 
> arrays.  Even if copying collection is used for other objects, these 
> large objects never move.

Yes, indeed.

> Furthermore, if an array contains no pointers (because, for example, 
> it's a byte array read from a file) it does not need to be scanned by 
> the garbage collector.

Like these unboxed array types.

> So I wouldn't worry about having your huge binary objects walked by the 
> garbage collector.  Whatever GC may do to a heap chock-full of tiny 
> objects, a single large pointer-free object should be left alone.

Sadly the case I had in mind is exactly the former, of large syntax
trees and large symbol tables. About 400Mb of seldom accessed mostly
read-only and yet unpagable data.

Then to makes things worse we've got some nasty little piece of code
which walks the AST and for some inexplicable reason generates vast
amounts of garbage. To make things work on normal machines we have to
set the heap limit as low as possible and so the garbage collector has
to run very frequently reclaiming very little each time and yet it has
to touch all of the rest of the 400Mb dataset which prevents it being
paged out. My tests indicate that 3/4 of the running time is spent doing
GC. </grumble> :-)

Duncan



More information about the Libraries mailing list