[Haskell-cafe] lookup tables & style guidelines

Thu Apr 24 11:33:55 EDT 2008

Ketil Malde wrote:
> Don Stewart <dons at galois.com> writes:
> 
>>>    1) what is the most performant lookup table/hashtable/dictionary solution
>>>    for Haskell?
> 
>> Data.IntMap is awfully good.
> 
> Is it benchmarked anywhere?  Compared to the Judy bindings, or Adrian
> Hey's AVL trees, or Data.Hashtable?  

I must get around to putting the AVL clones of Data.Map/Set in Hackage
sometime. Meanwhile anyone wanting to roll their own maps with an API
of their chosing could do a lot worse than the raw AVL lib..

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/AvlTree-3.1

Also, if you're likely to be using union/intersection a lot you should
know that Data.Map/Set are very slow for this because they use the
not efficient hedge algorithm :-)

There's a really cool demo I found from wikipedia that shows why it is
that AVL trees perform so well in the "pathological" situation of sorted
insertions..

http://www.csi.uottawa.ca/~stan/csi2514/applets/avl/BT.html

If you try it you can see that after 2^n-1 sorted insertions you always
get a perfectly balanced tree. This explains these benchmark results..

http://groups.google.co.uk/group/comp.lang.functional/msg/74a422ea04ff1217

DData is what became Data.Map/Set and it would appear to be the worst
performing of the four tree types tested there :-(

Don is right about Data.IntMap/IntSet. For Ints it will be much faster
than Data.Map/Set or even (polymorphic) AVL tree. But an AVL tree of
unboxed Ints gives faster lookup than IntMap/Set and about the same
insert/delete times..

http://www.haskell.org/pipermail/libraries/2005-October/004518.html

Union and Intersection times for AVL aren't so great, but I think
I know what to do about that (especially intersection).

But the real way ahead has to be Tries for non-trivial keys and (I
suspect) AVL trees of unboxed Ints for simple keys (serialisable
as 1 machine word). This is what that GSoC project is all about.
At the moment we have the exact opposite, Tries for Ints and balanced
trees for non-trivial keys. Oh well..

Regards
--
Adrian Hey