Finalizers: conclusion?

Thu Jan 23 11:29:52 EST 2003

Hi Manuel,

Sorry for the delay in replying.  I'll quote a little more of the 
previous messages than usual to help refresh context.

Manuel M T Chakravarty wrote:
> Antony Courtney <antony at apocalypse.org> wrote,
> 
> 
>>You indicated that you were somewhat unclear why we need liveness 
>>dependencies.  I'll attempt to clarify by sketching some of the details 
>>of the particular C library for which I am writing FFI wrappers.
>>
>>I have a C library for 2D vector graphics.  Two of the abstract types 
>>provided by this C library are:
>>    Pixmap -- A handle to an actual buffer of raster data
>>    RenderContext -- A handle that encapsulates all state associated 
>>with rendering, such as the current color, current font, target pixmap, etc.
>>
>>Note that it is possible to create many RenderingContext's that all 
>>render on to the same underlying Pixmap.
>>
>>To see why we need liveness dependencies, consider the following typical 
>>usage scenario in Haskell:
>>    do pm <- createPixmap               -- 1
>>       rc <- createRenderContext pm     -- 2
>>       drawBox rc                       -- 3
>>       ...
>>
>>Note that, in the above, it's possible that the call to 
>>createRenderContext in line 2 could be the last Haskell reference to pm, 
>>making it a candidate for collection.  But we don't actually want the 
>>Pixmap to be collected (and its finalizer invoked) until both the Pixmap 
>>  *and* all associated rendering contexts which refer to the Pixmap 
>>become unreachable.
>>
>>The reason we need liveness dependencies is because, internally, the 
>>RenderContext maintains a pointer to the target Pixmap.  But because 
>>this pointer exists only in the C heap, we need some way to inform 
>>Haskell's garbage collector that whenever a particular RenderContext is 
>>reachable, then its target pixmap is also reachable.
> 
> 
> IMHO you are trying to compensate for a flaw in the whole
> setup:
> 
> * Line 1: You get a pointer to a C object assuming it is the
>     last reference to that C object.
> 
> * Line 2: You pass this pointer back to C without copying
>     it; ie, the only reference to the C object is in C land.
> 
> At this moment, the pointer obtained on Line 1 is no longer
> the business of the Haskell system.  It is a pointer in C
> land to a C object;   so, memory management of that structure
> should be let to the C library.  

I'm sorry, but I simply don't agree with your rationale here (nor do I 
see a "flaw in the whole setup").

Yes, your observations about when references are live in C and when
references are live in Haskell in the above code fragment is correct. 
However, in my opinion, this is an implementation detail.  The user of 
my Haskell library should not know or care whether the library is 
implemented in Haskell, in C, or in some combination of the two.

In this case, Pixmap and RenderContext could very easily be implemented 
entirely in Haskell (i.e. just make Pixmap a byte array, and 
RenderContext a record type that maintains a Pixmap in one of its 
fields).  If it were implemented this way, then of course any live 
reference to a RenderContext will ensure that the Pixmap it refers to 
will not be GC'ed, since the field of the RenderContext record would 
contain a reference to the Pixmap.  I see liveness dependencies as a way 
for me (as a library implementor) to use an external (C language) 
representation for a Haskell data structure, whilst retaining one of the 
most important benefits of programming in Haskell (garbage collection).

 > Assume the following C function
> 
>   RenderContext *createPixmapWithContext ()
>   {
> 
>     Pixmap *pm = createPixmap ();
>     return createRenderContext (pm);
>   }
> 
> in conjunction with
> 
>   do
>     rc <- createPixmapWithContext
>     drawBox rc
> 
> How is this different from your Haskell code in a way that
> requires a foreign pointer dependency in one case, but not
> in the other?

For starters, I would never, ever write the createPixmapWithContext() 
function in C because it is an obvious memory leak.  It allocates two 
objects (via createPixmap and createRenderContext), but returns a 
pointer to only one of them.  You could potentially get away with this 
if you happen to use some reference counting scheme in C, but I never 
suggested I was doing any such thing (more on this below).

To be honest, I don't really see your point here.  I am implementing a 
Haskell library that happens to use some external (C language) 
representations for some data structures, and I would like to use 
Haskell's garbage collector to ensure that this Haskell library works as 
a Haskell programmer would expect.  What you have presented above is an 
arbitrary C function that does some heap allocation that is never 
visible to the Haskell runtime.  I would never expect liveness 
dependencies (or anything else) to enable the Haskell runtime to track 
heap allocation in arbitrary C code.

>[...]
 > As `createPixmapWithContext()' demonstrates, C land
> must free `pm' when the last render context referring to
> `pm' dies.

Not necessarily true!  What you are suggesting (reference counting) is 
one possible memory management strategy in C, but is by no means the 
only option.

Another possibility (the one I actually use) is simply for 
RenderContexts to do absolutely no memory management of the underlying 
Pixmaps whatsoever.  Then it is up to whoever created the Pixmap to free 
the Pixmap, and up to whoever created a RenderContext to ensure that the 
RenderContext will not be used after its underlying Pixmap has been 
freed.  This is relatively easy to document in prose in a library manual 
page.  I see liveness dependencies as a way of exporting exactly such 
informal requirements to a high-level language's garbage collector.

For C libraries I prefer this kind of explicit memory management scheme 
over reference counting because:
   (a) it is simpler to implement,
   (b) it provides a foundation for implementing higher level memory 
management schemes in C (it easy to wrap these primitive objects in 
higher level constructs that provide reference counting or arena-based 
allocation if you want that)
and
   (c) the library can be exported to a garbage collected language, 
without any "impedance mismatch" between reference counting on the C 
side (with its known flaws collecting cyclic structures) and the calling 
language's GC scheme.

> IMO the only clean way to approach this problem is to add a
> reference counting scheme to `pm' in C land.  

Obviously I disagree.  I hope I've clearly articulated why, and that 
there is a reasonable, simple alternative.

> BTW,
> this is exactly how this problem is solved in the GTK+ GUI
> toolkit.

Reference counting is a decent crutch (with some known flaws) for those 
who are stuck programming in C.  But for those of us working with a true 
high level language, I think using foreign pointers, finalizers and 
liveness dependencies to enable use of the high level language's 
collector when programming in the high level language is a far better 
alternative.

	-antony

-- 
Antony Courtney
Grad. Student, Dept. of Computer Science, Yale University
antony at apocalypse.org          http://www.apocalypse.org/pub/u/antony