garbage collection

Thu Apr 21 05:57:20 EDT 2005

On 20 April 2005 15:56, Bulat Ziganshin wrote:

> Tuesday, April 19, 2005, 4:15:53 PM, you wrote:
> 
>>> 1) can you add disableGC and enableGC procedures? this can
>>> significantly improve performance in some cases
> 
>> Sure.  I imagine you want to do this to avoid a major collection
>> right at the peak of a residency spike.
> 
>> You probably only want to disable major collections though: it's safe
>> for minor collections to happen.
> 
> no, in that particular case i have very simple and fast algorithm,
> which allocates plenty of memory. minor GC's in such situation is just
> waste of time. so i want to do:
> 
> disableGC
> result <- eatMemory
> enableGC
> 
> with a effect that all memory allocated in 'eatMemory' procedure will
> be garbage collected only after return from this procedure. currently
> i have this stats:
> 
>   INIT  time    0.01s  (  0.00s elapsed)
>   MUT   time    0.57s  (  0.60s elapsed)
>   GC    time    1.41s  (  1.41s elapsed)
>   EXIT  time    0.00s  (  0.00s elapsed)
>   Total time    1.99s  (  2.01s elapsed)
> 
>   %GC time      70.8%  (70.1% elapsed)
> 
>   Alloc rate    171,249,142 bytes per MUT second
> 
>   Productivity  28.7% of total user, 28.4% of total elapsed
> 
> as you see, it is very inefficient

I see (I think).  Unfortunately currently the size of the allocation
area is fixed after a GC, so you'll have to change the code in the
runtime to keep allocating more blocks for the nursery.

>> I guess you're proposing using madvise(M_FREE) (or whatever the
>> equivalent is on your favourite OS).  This would certainly be a good
>> idea if the program is swapping, but might impose an overhead when
>> running in memory.  I don't know, I haven't tried.
> 
> i don't see resons why this can be slower. we will be a "good
> citizens" - return memory what is not used at current moment and
> reallocate memory when needed.

It might be slower because it involves extra calls to the kernel to
free/allocate memory, and the kernel has to update its page tables.

I mentioned madvise() above: this is a compromise solution which
involves telling the kernel that the data in memory is not relevant, but
doesn't actually free the memory.  The kernel is free to discard the
pages if memory gets tight, without actually swapping them to disk.
When the memory is faulted in again, it gets filled with zeros.  This is
ideal for copying GC: you madvise() the semispace you just copied from,
because it contains junk.

IIRC, madvise() is a BSD-ish interface, but other OSs probably have
similar facilities.

We could also consider really returning memory to the OS.  This requires
more work in the runtime, though.

> current implementation only allows memory usage to
> grow and that is not perfect too. imho it will be better to release
> unneeded memory after major GC and perform next major GC after
> allocating fixed amount of memory or, say, after doubling used memory
> area 

GHC has quite a sophisticated block-based storage manager.  It's not
obvious how to understand your comments in the context of GHC - I
suggest you take a look at the source code.

Cheers,
	Simon