AVL trees for Data.Map and Data.Set
Christian Maeder
maeder at tzi.de
Wed Mar 1 09:13:25 EST 2006
Jean-Philippe Bernardy wrote:
> On 2/27/06, Ross Paterson <ross at soi.city.ac.uk> wrote:
>>> We are planning to move to AVL
>>> trees for A. Hey, which do not provide a O(1) size function.
>> What will be gain for this move? And what will we lose, apart from
>> O(1) size?
>
> The gains:
> 1. Christian Maeder and others report a significant performance improvement.
> This is the main motivating factor.
Indeed, the gain is significant (at least 5% lookup speedup!). Adrian
Hey simply proved to me that AVL trees (with his clever constructor
choice without an explicit height) are a better data structure than
Adam's "Efficient sets: a balancing act". And it's sort of regrettable
that the old FiniteMap and the current Data.Map are based on the latter
(until someone finds a speedup for Adam's sets)!
> 2. The AVL tree library is very comprehensive and can be very useful by itself.
I don't share this point. The trees suffer the severe problem that the
order argument can easily be mixed up. So I'd prefer not to directly use
these trees. I also did not like the hscpp stuff and the many (tricky?)
modules when installing it.
> 3. A separate balanced Tree infrastructure provide many benefits:
> * Shared code between Map and Set.
> * Conversion between those should then become more efficient
> (requested by John Meacham)
Without sharing the current Data.Map is faster and even the currently
fastest "AVL (k, a)" would become faster. However, I would not sacrifice
sharing for an only marginal speedup (below 1%). (But it would be
interesting to know the speedup.)
further personal judgements:
1. Data.IntMap and Data.IntSet don't need to be touched since they are fast.
2. (portable) Data.Map and Data.Set modules belong into the base
package. These data structures would be useful for ghc itself.
3. Jean-Philippe's collection framework may sit on top in a different
package (supporting more generic applications)
4. A few interface cleanups should be no problem
Cheers Christian
More information about the Libraries
mailing list