AVL trees for Data.Map and Data.Set

Christian Maeder maeder at tzi.de
Wed Mar 1 09:13:25 EST 2006


Jean-Philippe Bernardy wrote:
> On 2/27/06, Ross Paterson <ross at soi.city.ac.uk> wrote:
>>> We are planning to move to AVL
>>> trees for A. Hey, which do not provide a O(1) size function.
>> What will be gain for this move?  And what will we lose, apart from
>> O(1) size?
> 
> The gains:
> 1. Christian Maeder and others report a significant performance improvement.
>     This is the main motivating factor.

Indeed, the gain is significant (at least 5% lookup speedup!). Adrian 
Hey simply proved to me that AVL trees (with his clever constructor 
choice without an explicit height) are a better data structure than 
Adam's "Efficient sets: a balancing act". And it's sort of regrettable 
that the old FiniteMap and the current Data.Map are based on the latter 
(until someone finds a speedup for Adam's sets)!

> 2. The AVL tree library is very comprehensive and can be very useful by itself.

I don't share this point. The trees suffer the severe problem that the 
order argument can easily be mixed up. So I'd prefer not to directly use 
these trees. I also did not like the hscpp stuff and the many (tricky?) 
modules when installing it.

> 3. A separate balanced Tree infrastructure provide many benefits:
>   * Shared code between Map and Set.
>   * Conversion between those should then become more efficient
> (requested by John Meacham)

Without sharing the current Data.Map is faster and even the currently 
fastest "AVL (k, a)" would become faster. However, I would not sacrifice 
sharing for an only marginal speedup (below 1%). (But it would be 
interesting to know the speedup.)

further personal judgements:

1. Data.IntMap and Data.IntSet don't need to be touched since they are fast.

2. (portable) Data.Map and Data.Set modules belong into the base 
package. These data structures would be useful for ghc itself.

3. Jean-Philippe's collection framework may sit on top in a different 
package (supporting more generic applications)

4. A few interface cleanups should be no problem

Cheers Christian


More information about the Libraries mailing list