IORefs and garbage collection

Jan-Willem Maessen jmaessen at alum.mit.edu
Tue Apr 18 20:38:10 EDT 2006


On Apr 18, 2006, at 4:05 PM, Ravi Nanavati wrote:

> I recently discovered that I'm running into the IORef / garbage
> collection issue described in the ghc user guide (at the bottom of the
> following page):
>
> http://www.haskell.org/ghc/docs/6.4/html/users_guide/faster.html

I did a bit of back-and-forth with Simon M. on this when I was  
fiddling with Data.HashTable.  It's especially bad for IOArrays.

> Increasing the heap and allocation area size (with -H, -M and -A)  
> helped
> improved my runtimes considerably (by preventing all of my  
> execution time from being sucked up by the garbage collector), but  
> I have some questions:
>
> 1. What "goes wrong" with mutable data and generational garbage
> collection to make this sort of workaround necessary? The issue  
> surprised me because I thought there was plenty of satisfied  
> experience with generational gc for languages with many more  
> destructive updates (e.g. Java) than Haskell. Is there some  
> optimization that the ghc runtime is not implementing?

There are a number of known optimizations.  Basically, for an  
efficient generational GC, you need to make use of either a read or a  
write barrier; since the write barrier only applies if we actually  
mutate something, it's the usual solution.  This allows us to keep  
track of IORefs in old-space which point into new-space.

GHC is naive and just assumes all IORefs point into new space.  But  
after 1 GC pass, if you don't touch an IORef this is certainly false  
(if I understand the promotion policy correctly).

Note that a read or write barrier is necessary for efficient  
implementation of almost *any* GC algorithm except naive "stop and  
collect the entire world".

> 2. If there is a known fix for this issue, what would it involve  
> (and, if there are any guesses, how much work might it be)?

I had thought this was on the list of things fixed in 6.4.  Simon?

> 3. What is the best workaround for this issue if you *don't* know  
> the maximum amount of memory available for your program? Would be  
> it be best to fall back copying collection if you want your program  
> to consume and release memory "as needed" or is there a cleverer  
> trick?

Hmm, it'd be nice if the allocation area dynamically scaled up based  
on GC time.  But those calculations are awfully hard to get right  
(you need to build a pretty-good cost model of GC, and the cost must  
include things like chasing down all those IORefs).

-Jan

>
> Thanks,
>
>  - Ravi
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



More information about the Glasgow-haskell-users mailing list