storing highly shared data structures
Simon Marlow
simonmar at microsoft.com
Fri Dec 23 10:18:20 EST 2005
On 22 December 2005 13:43, Christian Maeder wrote:
> for storing highly shared data structures we use so called Annotated
> Terms (shortly ATerms, details below).
>
> http://www.cwi.nl/htbin/sen1/twiki/bin/view/SEN1/ATerm
>
> In contrast to the Binary (or GhcBinary) class we compute the sharing,
> which saves a lot of space for data types that keep redundant
> information.
>
> With this we can store some of our data structures (of course only
> non-cyclic and finite ones) in a few KBs that need MBs if stored
> without sharing (as when using the Binary or the Show/Read
> classes).
>
> So far so good. The problem remaining is that an object is _traversed_
> as if being unshared and thus the _time_ for the ATermTable
> construction becomes too long for us.
>
> GHC's internal data structures (on the heap) are in many cases shared,
> by pointer references. I.e. if I add a (single) symbol table to every
> symbol that I use, then the symbol table will not be copied, but only
> a reference added to my symbol.
>
> How can I detect this sharing in order to avoid traversing the very
> same symbol table for every symbol?
>
> I've tried to use a "Map (Ptr ()) ShATerm". So before traversing an
> object I look up its address and check if is was traversed before (if
> not I traverse it and store it in my map for future lookups).
>
> 1.) I'm not sure if it is safe to use "Ptr ()" (meanwhile garbage
> collection, heap compaction and what not could happen).
Right - Ptr isn't the right thing here, because GC will move objects
around. That's why we have StablePtr and StableName. In fact, what you
really want here is the pointer-equality memo table implementation in
the Memo module (package util). This is scheduled to be removed in 6.6,
but it will be available as a Cabal package.
Cheers,
Simon
More information about the Glasgow-haskell-users
mailing list