Maps, was Re: GHC source code improvement ideas

Adrian Hey ahey at iee.org
Sun Jan 6 07:37:53 EST 2008


Christian Maeder wrote:
> Simon Marlow wrote:
>> Regarding Data.Map, I'd be interested in trying out AVL trees instead,
>> to see if they have better performance, but then we'd have to import the
>> code of course.
> 
> Surely, we want the best maps possible for ghc and as public library
> (and minimize maintenance).
> 
> The problem is to agree on a suitable interface. I would suggest to take
>  (or only slightly change) Daan's interface (the current Data.Map) and
> improve the underlying implementation possibly using (i.e. Adrian Hey's)
> AVL trees.

The trouble is as far as implementations is concerned the best maps
possible is a continually moving target I suspect, not to mention
being highly dependent on key type. I certainly wouldn't say AVL tree
based Maps are best possible, though they do seem give better
performance (particularly for union, intersection etc..). The clone
also address some defects in the current Data.Map (like lack of
strictness control) and has some other useful stuff.

But if this is going to be used at all, I would say this is only
a stop gap solution, which leads me to your second point about
interfaces. The generalised trie lib I was working on was a serious
attempt to see what a real useable non-toy map API should look
like. It's the GT class here..
http://code.haskell.org/collections/collections-ghc6.8/Data-Trie-General/Data/Trie/General/Types.hs
(Sorry for the long URL).

It's already somewhat huge and still incomplete IMO, but even in it's
current form it gives a uniform API for Ints, arbitrary Ord instances
and Lists. It's a shame it's all completely untested :-(

What really needs to happen on this is..
  1 - "Finish" and stabilise the GT class definition. There's still more
  that's needed but I think the promised type families magic is needed
  first.
  2 - Write a comprehensive test/benchmarking suite for GT instances.
  3 - Provide some way to automatically generate the instances for
  arbitrary user defined types.

Which is all rather a lot of work that nobody seems very interested
in :-(

Regards
--
Adrian Hey



More information about the Glasgow-haskell-users mailing list