Weak pointers and STM

Thu Dec 4 04:09:06 EST 2008

John Meacham wrote:
> Well, the actual problem I am trying to solve involves properly
> reclaiming elements in a  circularly linked list (next and prev pointers
> are TVars). I have a linked list and I need to be able to remove values
> from the list when all references to the node no longer exist, not
> counting the linked list references themselves.
> 
> Using Weak pointers inside the list itself doesn't work, since if an
> element is collected, you also lose the information needed to stitch up
> the list.
> 
> Originally, I had a wacky plan involving weak pointers in the nodes
> themselves pointing to sentinal nodes, when the sentinal was collected,
> I then know I can delete the node. The idea was that I can lazily delete
> entire chains of nodes rather than one by one. I gave up on that idea.
> (deRefWeak not working in STM was sort of a show stopper, and it was
> overly complicated)
> 
> So now I have a scheme whereby I attach a finalizer to a proxy thunk.
> 
> 
>> data TimeStamp = TimeStamp TS
>>
>> data TS = TS {
>>     tsWord :: TVar Word64,
>>     tsPrev :: TVar TS,
>>     tsNext :: TVar TS
>>     }
> 
> so, the finalizer attached to 'TimeStamp' values simply deletes the
> value it points to from the linked list.

What you want here is to attach the finalizer to one of the TVars, probably 
tsWord.  Attaching the finalizer to Timestamp is very risky: the compiler 
is free to optimise the Timestamp wrapper away, regardless of how much you 
hide in the module API.  For example, consider an operation on Timestamp: 
once the operation has looked inside the Timestamp wrapper, it no longer 
holds a pointer to it, so the finalizer might run, even though the 
operation is still working on the TS.  A function that is strict in 
Timestamp will have a worker that takes the unboxed TS, and might even 
re-wrap it in a new Timestamp (with no finalizer, of course).

You can work around this using touch#, but that's a bit inelegant, and I'm 
not certain it solves all the problems.

This is why we have mkWeakIORef and addMVarFinalizer - they attach 
finalizers to the primitive objects inside the IORef/MVar respectively, so 
you can be sure that compiler optimisations aren't going to affect the 
finalization properties you want.

Adding finalizers to arbitrary objects was useful for the memo table 
application we had in mind when weak pointers were introduced, but for all 
the other applications I've come across since then, we really want to add 
finalizers to objects whose lifetimes are under programmer control.  Notice 
how ForeignPtrs attach the finalizer carefully to the MutVar# inside the 
ForeignPtr, not the ForeignPtr itself.

> The module interface ensures
> that only 'TimeStamp' values can escape and each has a finalizer
> attached. the interface is quite simple:
> 
>> newTimeStamp :: IO TimeStamp
>> insertAfter :: TimeStamp -> IO TimeStamp
> 
> now, the problem is that I want to export insertAfter in the 'STM'
> monad, not the IO one. however, I am not sure how to pull it off. I
> cannot ensure the finalizer is only attached once to the node, if I use
> 'unsafeIOToSTM' then STM retrying could end up created multiple finalized
> nodes, possibly prematurely deleting the element from the linked list.
> 
> basically, what would be really nice is if there were something like
> 
>> registerCommitIO :: IO () -> STM ()

Yes, we ought to have this.  As others have pointed out, you can do this by 
adding another monad on top of STM, but you can't really do registerRetry 
that way.

> where all IO actions registered with this function (within an atomically
> block) are executed exactly once if and only if the atomically block
> commits. The IO action is not run within the STM block, notably
> 
>         atomically $ do foo; registerCommitIO bar; baz
> 
> is equivalent to 
> 
>         atomically (do foo; baz) >> bar
> 
> 
> I found I needed the equivalent of 'touchForeignPtr' for arbitrary
> objects (which I was able to do with the touch# primitive, but
> 
>> touch :: a -> IO ()
>> touchSTM :: a -> STM ()
> 
> would both be at home in System.Mem.Weak.

with appropriate caveats, of course!

> While I am wishing for things,
> 
>> unsafePerformSTM :: STM a -> a
> 
> would be really handy too :)

The trouble with that is that it can lead to nested transactions, and the 
RTS isn't set up to handle that.  It's probably a fair bit of work.

 > insertAfter :: TimeStamp -> IO TimeStamp
 > insertAfter t@(TimeStamp ts) = do
 >     (tts,ts) <- atomically $ insertAfter' ts
 >     touchTS t
 >     addFinalizer ts (deleteTimeStamp tts)
 >     return ts

ah, I see you're adding the finalizer to the TS, not the Timestamp.  Same 
arguments apply, though.

 > touchTS :: TimeStamp -> IO ()
 > touchTS ts =  touchForeignPtr (unsafeCoerce# ts)

*blink*  that can't possibly work!

Cheers,
	Simon