How unsafe are unsafeThawArray# and unsafeFreezeArray#

Fri Mar 9 13:26:34 CET 2012

On 09/03/2012 11:44, John Meacham wrote:
> Out of curiosity, Is the reason you keep track of mutable vs not
> mutable heap allocations in order to optimize the generational garbage
> collector? as in, if a non-mutable value is placed in an older
> generation you don't need to worry about it being updated with a link
> to a newer one or is there another reason you keep track of it? Is it
> a pure optimization or needed for correctness?

It's for correctness - this is how we track all the pointers from the 
old generation to the new generation.

There are lots of different ways of doing generational GC write 
barriers, and indeeed GHC uses a mixture of different methods: a 
remembered set for updated thunks and mutated IORefs, and a card table 
for arrays.  Mutable arrays normally stay in the remembered set 
continuously, so that the write barrier for array mutation doesn't have 
to worry about adding the array to the remembered set, but it does 
modify the card table.  An immutable array can be dropped from the 
remembered set, but only once all its pointers are safely pointing to 
the old generation.  But then, unsafeThawArray has to put it back in the 
remembered set again.

Once upon a time all the mutable objects were in the remembered set, but 
gradually we've been moving away from this and adding them to the 
remembered set only when they get modified.  There are a few objects 
that still use the old method: TVars, for example.

> A weakness of jhc right now is its stop the world garbage collector,
> so far, I have been mitigating it by not creating garbage whenever
> possible, I do an escape analysis and allocate values on the stack
> when possible, and recognize linear uses of heap value in some
> circumstances and re-use heap locations directly (like when a cons
> cell is deconstructed and another is constructed right away I can
> reuse the spacue in certain cases) but eventually a full GC needs to
> run and it blocks the whole runtime which is particularly not good for
> embedded targets (where jhc seems to be thriving at the moment.). My
> unsafeFreeze and unsafeThaw are currently NOPs. frozen arrays are just
> a newtype of non-frozen ones (see
> http://repetae.net/dw/darcsweb.cgi?r=jhc;a=headblob;f=/lib/jhc-prim/Jhc/Prim/Array.hs)

Interesting - we do generational GC instead of escape analysis and other 
"compile-time GC" techniques.  IMO, generational GC is a necessity, and 
once you have it, you don't need to worry so much about short-lived 
allocation because it all gets recycled in the cache.  (try +RTS -G1 
sometime to see the difference between generational GC and ordinary 
2-space copying).

On the other hand, generational GC does give rise to some strange 
performance characteristics, especially with mutable objects and 
sometimes laziness.  And it doesn't solve the pause time problem, except 
that the long pauses are less frequent.  Improving pause times is high 
up my list of things we want to do in the GC.

Cheers,
	Simon