IORefs and garbage collection

Bulat Ziganshin bulat.ziganshin at gmail.com
Wed Apr 19 05:56:25 EDT 2006


Hello Ravi,

Wednesday, April 19, 2006, 12:05:28 AM, you wrote:

> 1. What "goes wrong" with mutable data and generational garbage
> collection to make this sort of workaround necessary?

generational GC (at least one used in GHC) is better suited for
"functional" (immutable) data. this conception by itself based on that
new allocated data can contain references to old ones but not vice
versa. minor GC algorithms scans only new data, if they don't contain
any references to some data cell (in new data), then all the heap
don't contain references to this cell and it can be freed. most of the
data in "normal" Haskell program are immutable so this works fine. but
when there are mutable data, old part of heap can contain references
to new data so on each minor gc we must scan not only new data but
also all these mutable arrays/references. and this makes the problem
when number of these references are too big. raising -A/-H just
dramatically reduces number of minor GCs

> The issue
> surprised me because I thought there was plenty of satisfied experience
> with generational gc for languages with many more destructive updates 
> (e.g. Java) than Haskell. Is there some optimization that the ghc 
> runtime is not implementing?

it seems that this issue was much less important for Haskell because
many programs are written in fucntional way and not use mutable vars
very much. one program that encounter this problem is ghc itself :)
without -A it can spend 50% of worktime in GCs :)

> 2. If there is a known fix for this issue, what would it involve (and,
> if there are any guesses, how much work might it be)?

fixed partially in 6.6 (afaik, large IOArrays will still have the
problem). try to find GHC error ticket about this, it have the complete story

in short, ghc 6.6 keeps the list of all arrays/references that was
updated after last GC, so it can scan only updated ones instead of all
mutable objects. on the other side, this list contain entire array
even if only one of it's element was updated. this entire array
scanned on next minor GC. this can create problems for really large
arrays (with millions of elements)

also you can use unboxed arrays to avoid this scanning, this
works in any GHC version

> 3. What is the best workaround for this issue if you *don't* know the 
> maximum amount of memory available for your program? Would be it be best
> to fall back copying collection if you want your program to consume and
> release memory "as needed" or is there a cleverer trick?

use "-A10m", for example (i do this). also you can make some
additional gain by setting "-H" to the amount of memory that will be
anyway used or will be anyway free when program running. so you can
use something like this:

"-H300m -A10m"

in this case program will alloc at startup 300 mb and run minor
GC only when all this memory used. if for example after this GC 30 mb
will remain used, then next GC will run only after that 270 free mb
will be alloced. when memory usage will grow after 300 mb, minor GCs
will be performed after each 10 mb ("-A") alloced


-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com



More information about the Glasgow-haskell-users mailing list