[Haskell-cafe] A GC'd heap using weak pointers

Tue Oct 20 00:41:19 UTC 2015

On 6 October 2015 at 21:25, Thomas Koster <tkoster at gmail.com> wrote:
> I am implementing a simple, interpreted language that needs a garbage-
> collected heap for storage. Like most discussions on memory management,
> I use the term "heap" by analogy - the actual data structure is not
> necessarily a true heap.
>
> My Heap type is basically a map of "addresses" (Int) to values. Values
> may themselves contain addresses, perhaps recursively to each other
> (like letrec). The extra field in Heap stores the next unused address,
> used to allocate fresh slots, as you can see in "heapAlloc".
>
>> type Address = Int
>> data Value = ...things with Addresses...
>> data Heap = Heap !Address (Map Address Value)
>>
>> heapAlloc :: Heap -> (Address, Heap)
>> heapAlloc (Heap nextFreeAddr binds) =
>>   (nextFreeAddr, Heap (succ nextFreeAddr) binds)
>
> There is also a stack of roots and an expression under evaluation.
>
>> type Stack = [...things with Addresses...]
>> data Expr = ...things with Addresses...
>
> The function I need to write is:
>
>> heapGC :: [Address] -> Heap -> Heap
>> heapGC roots heap = ...collect unreachable values from heap...
>
> Rather than re-invent this particularly complex wheel, I am wondering
> if I could use GHC weak pointers and have the GHC runtime collect values
> in my heap for me. Something like this:
>
>> type Address = Int
>> data Reference = Reference Address
>> data Value = ...things with References...
>> data Heap = Heap !Address (Map Address (Weak Value))
>>
>> heapAlloc :: Heap -> (Reference, Heap)
>> heapAlloc (Heap nextFreeAddr binds) =
>>   (Reference nextFreeAddr, Heap (succ nextFreeAddr) binds)
>>
>> heapStore :: Reference -> Value -> Heap -> IO Heap
>> heapStore ref@(Reference addr) val (Heap nextFreeAddr binds) = do
>>   weakVal <- mkWeak ref val Nothing
>>   return $ Heap nextFreeAddr (Map.insert addr weakVal binds)
>
> Note the new type, Reference. This replaces Address everywhere except
> in the heap's internal structure, which continues to use Address.
> Reference values are opaque, unique and obtained from the heap using
> heapAlloc.
>
> The idea is that only two things should cause Values to stay alive:
> 1. reachable holders of Reference (by GHC's definition of "reachable"),
> 2. ordinary Haskell closures arising from stepping the interpreter.
> "Being in the Heap" should not be enough to keep a Value alive. Then,
> my heapGC function should only have to prune dead addresses from the
> map.

For those who were following this, the technique described above
(using System.Mem.Weak to emulate a Heap) does appear to work
correctly. You do have to periodically prune the dead pointers from
it, of course.

However, after a brainwave while in the shower, I realized I could
remove the Heap emulation from my interpreter completely and use the
Haskell heap directly with IORefs. What used to be "Address" in the
old graph is now "IORef Value". Where let/letrec expressions in my
interpreted language previously stored fresh closures in the emulated
"Heap", they now create fresh closures using "newIORef", and
substitute an "IORef Value" for the free variable just bound.

It's always nice when you can solve a problem by deleting code!

--
Thomas Koster