storing highly shared data structures

Simon Marlow simonmar at microsoft.com
Fri Dec 23 10:18:20 EST 2005


On 22 December 2005 13:43, Christian Maeder wrote:

> for storing highly shared data structures we use so called Annotated
> Terms (shortly ATerms, details below).
> 
>    http://www.cwi.nl/htbin/sen1/twiki/bin/view/SEN1/ATerm
> 
> In contrast to the Binary (or GhcBinary) class we compute the sharing,
> which saves a lot of space for data types that keep redundant
> information.
> 
> With this we can store some of our data structures (of course only
> non-cyclic and finite ones) in a few KBs that need MBs if stored
> without sharing (as when using the Binary or the Show/Read
> classes).
> 
> So far so good. The problem remaining is that an object is _traversed_
> as if being unshared and thus the _time_ for the ATermTable
> construction becomes too long for us.
> 
> GHC's internal data structures (on the heap) are in many cases shared,
> by pointer references. I.e. if I add a (single) symbol table to every
> symbol that I use, then the symbol table will not be copied, but only
> a reference added to my symbol.
> 
> How can I detect this sharing in order to avoid traversing the very
> same symbol table for every symbol?
> 
> I've tried to use a "Map (Ptr ()) ShATerm". So before traversing an
> object I look up its address and check if is was traversed before (if
> not I traverse it and store it in my map for future lookups).
> 
> 1.) I'm not sure if it is safe to use "Ptr ()" (meanwhile garbage
> collection, heap compaction and what not could happen).

Right - Ptr isn't the right thing here, because GC will move objects
around.  That's why we have StablePtr and StableName.  In fact, what you
really want here is the pointer-equality memo table implementation in
the Memo module (package util).  This is scheduled to be removed in 6.6,
but it will be available as a Cabal package.

Cheers,
	Simon


More information about the Glasgow-haskell-users mailing list