Updates to FFI spec

Tue Aug 13 05:24:20 EDT 2002

> > System.Mem.performGC does a major GC.  When would a partial GC be
> > enough?
> 
> I've described the image-processing example a bunch of times.
> 
> We have an external resource (e.g., memory used to store images) which
> is somewhat abundant and cheap but not completely free (e.g.,
> eventually you start to swap).  It is used up at a different rate than
> the Haskell heap so Haskell GCs don't occur at the right times to keep
> the cost low and we want to trigger GCs ourselves.

Hmmm, the garbage collector is a black box and has its own complicated
heuristics for managing memory usage, but you are describing a mechanism
that depends rather heavily on certain assumed behaviours.  At the
least, that gives the garbage collector less flexibility to change its
own behaviour, lest it invalidate the assumptions made by the external
allocator.

> (In the image
> processing example, images were megabytes and an expression like (x +
> (y * mask)) would generate 2 intermediate images (several megabytes)
> while doing just 2 reductions in Haskell.)

I think I'd be tempted to try to use a more predictable allocation
scheme given the size of the objects involved.  Perhaps arenas? 

> How often and how hard should we GC?  We can't do a full GC too often,
> or we'll spend a lot of time GCing, destroy our cache and cause
> premature promotion of Haskell objects into the old generation which
> will make the GC behave poorly.  So if all we can do is a full GC,
> we'll GC rarely and use a lot of the external resource.
> 
> Suppose we could collect just the allocation arena.  That would be
> much less expensive (time taken, effect on caches, confusion of object
> ages) but not always effective.  It would start out cheap and
> effective but more and more objects would slip into older generations
> and have to wait for a full GC.
> 
> To achieve any desired tradeoff between GC cost and excess resource
> usage, we want a number of levels of GC: gc1, gc2, gc3, gc4, ...  Each
> one more effective than the last and each one more expensive than the
> last.  We'll use gc1 most often, gc2 less often, gc3 occasionally, gc4
> rarely, ...

But there seems to be no way to reasonably decide how often one should
call these.  Doesn't it depend on the garbage collector's own parameters
too?

> > I think the spec should be clarified along these lines:
> 
> >   Header files have no impact on the semantics of a foreign call,
> > and whether an implementation uses the header file or not is
> > implementation-defined.  Some implementations may require a header
> > file which supplies a correct prototype for the function in order to
> > generate correct code.
> 
> I still don't like the fact that compilers are free to ignore header
> files.  Labelling it an error instead of a change in semantics doesn't
> affect the fact that portability is compromised.

I don't see any alternative - would you require a compiler that has only
a native code generator to read header files?  When there's no C
compiler on the system? (this is realistic - at some point we'd like to
make the via-C route in GHC completely optional, so we can ship a
compiler on Windows that doesn't need to be bundled with GCC).

> > Perhaps on GHC you should be required to "register" the top module
> > in your program first, maybe something like
> 
> > 	registerModule(__stginit_Main);
> 
> > that way you can register multiple modules (which isn't possible at
> > the moment, you have to have another module which imports all the
> > others).
> 
> What does that do?  Is it for threading, GC, profiling, ...?

Each module has a little initialisation fragment that calls all the
initialisation fragments for the modules it imports.

At the moment, there are two kinds of initialisation done for each
module:

  - each foreign export is registered as a stable pointer.  This
    prevents the garbage collector from collecting any CAFs which
    might be required (indirectly) by a foreign export.

  - when profiling, all the cost centres in the current module
    are initialised.

It might be possible to do this using linker sets, but I haven't tried
(and it would probably be highly non-portable too).

Cheers,
	Simon