Effect of large binaries on garbage collection

Adrian Hey ahey@iee.org
Tue, 4 Mar 2003 11:05:02 +0000


Hello,

I'm writing a library which will require many blocks of binary data
of various sizes (some very large) to be stored in the heap. I'm a
little worried about the effect this will have on the efficiency of
garbage collection. I'm not sure how ghc gc works these days, but
I seem to remember it's a copying collector by default. If so it seems
a pity to waste time copying 10's of MBytes of binaries at each
collection.

The options I'm considering are..

(1) Use Haskell heap space
    Pros: Easy for me
    Cons: May slow down gc
	  AFAICS I can't use anything like realloc
          Current FFI proposals seem to prevent me from directly
          accessing Haskell heap objects from C land (or have I
          misunderstood?).

(2) Use C heap space
    Pros: Easy(ish) to use from C and Haskell ffi
    Cons: Unless C heaps have improved a lot since I last looked
          (which I doubt), it seems likely I will suffer from slow
          allocation and fragmentation problems. 

(3) Write my own "sliding" heap manager and use finalisers for
   garbage collection.
   Pros: Can tailor it to work exactly the way I want.
   Cons: More work for me, especially if I want the
         result to be portable across OS's. 
         Might be a complete waste of time if my worries
         about ghc heap management are groundless :-)

Any advice?

Thanks
--
Adrian Hey