AVL trees for Data.Map and Data.Set

Adrian Hey ahey at iee.org
Fri Mar 3 09:30:01 EST 2006

On Thursday 02 Mar 2006 10:10 pm, John Meacham wrote:
> On Thu, Mar 02, 2006 at 09:13:11PM +0000, Adrian Hey wrote:
> > I think what we should be aiming for in the long term is
> > automatically derived Generalised Tries (as discussed
> > recently).
> Would it be possible to mock up an implementation of this using
> Data.Generics by chance? so we can use a generalized tree for anything
> in Data and Ord? I find the idea of a derived generalized trie quite
> intriguing. would associated types be the best way to express such a
> thing (so you can use 'Trie a' everywhere as a type rather than some
> complicated class predicate).

I'm afraid I'm not sufficiently well versed in bleedin' edge type theory
and type system extensions to comment. But I would say that my preferance
would be for some compile time solution, rather than Data.Generics (because
of the runtime cost which I suspect this may involve). This exercise is
all about performance after all, so I don't want to throw any away

I think your earlier idea of using DrIFT seemed good, though I'm not sure
what you can do with it or whether it needs modifying in any way. (For
generalised tries we're deriving new types and then deriving new class
instances of the newly derived types.)

Since we last discussed this I've been thinking about this and have a
few points I'd like to mention if you'd like to give this a shot ..

Regarding efficiency, I think that in these nested Maps of Maps of Maps..
it's likely that the overwhelming majority of these Maps are singletons
and also that you will get long singleton chains. Also for the Map tuples
used for algebraic data type constructors, it's likely that most of
the Maps in the tuple are empty. So there's probably some optimised
representation you could use to exploit these facts. A good test would be
to see if the derivation produced a Trie implemntation for Lists that was
as efficient as a handmade ListMap using Tries, maybe similar to..

Regarding the role of the Ord class in all this, I think we should still
insist that Key types are instances of Ord (and that the Trie implementation
is consistent with that instance) for several reasons..
 1 - Compatibility with balanced tree implementations.
 2 - To give meaningful ordering for functions like elemsAscending etc..
 3 - To enable a standard solution to the "Set of Sets" problem that
     doesn't require deriving yet another generalised Trie type for the
     original generalised Trie type, if you see what I mean (more about
     this later).
For derived instances of Ord this probably isn't too hard. It seems
somewhat more difficult for hand written instances (e.g. one that used
a different ordering from the default for performance reasons).

Regarding the "Set of Sets" problem, if the element type is an instance
of Ord then we can uniquely represent a Set as a sorted list (obtained
via elemsAscending of similar). So we could just use a general Trie based
ListMap(Set) implementation for these. All that's needed to implement this
an efficient Map implementation for the element type (which has already
been derived).

This all seems doable in practice, though no doubt there's some gotcha
which I've overlooked. I have no plans to do this in the near future but
it would be really good if someone else wanted to give it a go (someone
such as yourself for example :-).

Adrian Hey

More information about the Libraries mailing list