Minor tweaks to ffi addendum

Wed Jun 5 11:41:59 EDT 2002

>> On two separate occasions, I've added interfaces to image
>> processing libraries where pretty much every function was pure (it
>> either generated a fresh image or returned an Int, Float or
>> whatever).  Images tend to be big (640x400x24bits is 3/4 Mbyte) so
>> a few dozen image processing steps might quickly use up a lot of
>> memory - and in all that time you don't go anywhere near the IO
>> monad.

> Sure, but relying on ForeignPtrs and the garbage collector is quite
> dodgy.  To support this kind of use we'd have to be much stricter
> about when finalizers are run: currently there's no guarantee about
> the promptness of running the finalizer.

We got round this by using allocators that looked like this:

 static allocated = 0;

 void* mymalloc(int sz)
 {
   void* r = malloc(sz);
   allocated += size;
   if (allocated > threshold) {
     hugs->garbageCollect();
   }
 }

 void myfree(void* x, int sz) 
 {
   allocated -= sz;
   free(x);
 }

This let us force Haskell to give us what it could.

(The threshold mechanism was a bit more sophisticated than shown - a
range of soft and hard limits and some estimate of how many objects we
thought Haskell might be willing to give back played a part.)

Of course, with a generational collector you really want to be able to
specify how many levels to collect:

  for(int depth=0; allocated > threshold; ++depth) {
    ghc->garbageCollect(depth);
  }

and you want the GHC to have the property that executing the sequence

  garbageCollect(0);
  garbageCollect(1);
  garbageCollect(2);
  garbageCollect(3);
  garbageCollect(4);
  garbageCollect(5);

costs little more than 

  garbageCollect(5);

Note that an interface based on asking GHC to keep GCing until it
recovers X Mb of space is no use - we're interested in freeing C
objects not Haskell objects.  

Note too that I'm assuming that garbageCollect will wait until any
finalizers released by the GC have had a chance to run.  (I have no
idea how this would be achieved in GHC.)

> Not to mention the fact that there's no guarantee at all about
> whether the garbage collector will detect that the ForeignPtr is
> unreachable when you think it should be.

That's true - but I have to performance tune my Haskell code just to
make sure it releases my Haskell objects so it's no hardship to have
to tune it to be sure it releases the C objects too.  The reason this
approach works (and it was very successful both times I did it) was
that memory is quite abundant and virtual memory adds a useful safety
margin.  If my program, uses 5 times as much memory as it has to, it
will probably still work ok.  I am wary of using this approach with
less abundant resources - as people using lazy file IO have found out
to their cost (when they run out of file descriptors).

> I think for these kind of applications using ForeignPtrs isn't the
> right thing - you really want to do the allocation/deallocation
> explicitly in the IO monad.

Those kinds of applications exist - but there are many pure programs
for which ForeignPtrs work extremely well and for which it would be
painful to have to insert frequent calls to 'runFinalizers'.

-- 
Alastair Reid        reid at cs.utah.edu        http://www.cs.utah.edu/~reid/