Weak reference semantics - why does a dead weak ref keep its value alive?

Fri May 23 23:02:43 UTC 2014

Hello Luite,

GHC's separation of weak references into keys and values is a
generalization which can be useful to avoid space leaks; the motivation
for the design is described in "Stretching the storage manager: weak
pointers and stable names in Haskell".

In particular, the variant of weak reference you suggest is the
/ephemeron/ semantics in Hayes.  Their reachability rule is:

    The value field of an ephemeron is reachable if both (a) the
    ephemeron (weak pointer object) is reachable, and (b) the key is
    reachable.

The paper goes into more detail why our semantics might be preferred,
but it boils down to:

(1) Our semantics is simpler,
(2) In this semantics, it is not clear when to run finalizers (if
you garbage collect the weak pointer objects early, you won't be able
to run their finalizers!)
(3) These semantics can be simulated

Cheers,
Edward

Excerpts from Luite Stegeman's message of 2014-05-23 06:40:07 -0700:
> Hi all,
> 
> I'm reviewing and improving my weak references implementation for GHCJS,
> among other things to make sure that the profiling/stack trace support,
> currently being implemented by Ömer Sinan Ağacan as a GSoC project has the
> correct heap information to work with.
> 
> JavaScript does not have weak references with 'observable deadness', so we
> have to walk the heap data structures from time to time to test
> reachability, schedule finalizers and break the weak links. I'm trying to
> find a design that keeps GHC's semantics, but where JavaScript's own
> garbage collector can get rid of as much data as possible, even before we
> run our own heap scan.
> 
> Now I ran into the following peculiarity, from the semantics
> (System.Mem.Weak documentation):
> 
> A heap object is reachable if:
> - It is a member of the root set.
> - It is directly pointed to by a reachable object, other than a weak
> pointer object.
> - It is a weak pointer object whose key is reachable.
> - It is the value or finalizer of a weak pointer object whose key is
> reachable.
> 
> This says that even if nothing has a reference to a weak pointer, as long
> as there are references to its key, the value is considered to be
> reachable. For example in this program
> 
> import Data.Maybe
> import System.Mem.Weak
> import System.Mem
> import Control.Exception
> import System.IO
> import Control.Concurrent
> 
> gc = performGC >> threadDelay 1000000
> 
> main = do
>   hSetBuffering stdout NoBuffering
>   k <- evaluate "k"
>   v <- evaluate "v"
>   addFinalizer v (putStrLn "fv")
>   w <- evaluate =<< mkWeak k v (Just $ putStrLn "fkv")
>   putStrLn =<< fmap (fromMaybe ".") (deRefWeak w)
>   addFinalizer w (putStrLn "fw")
>   gc
>   putStrLn k
>   gc
> 
> there is no way to reach 'v' after `w` has been finalized, but still it's
> kept alive. This agrees with the semantics in the documentation, but what's
> the reason for this?
> 
> luite