[Haskell] combining IntMaps

Daan Leijen daan at cs.uu.nl
Sat Jul 30 14:07:12 EDT 2005


Adrian Hey wrote:
> Hello,
> 
>>Thanks! It's interesting the way your AVL tree library is set up --
>>there seems to be a much broader degree of functionality than that
>>provided by Data.Set. But I'm trying to see, is there a significant
>>difference in the fundamental data structure.
> 
> Well Data.Set is based on a different balanced tree type (weight
> balanced trees), similar to those used in the Adams paper. I'm
> also quite sceptical about the Hedge algorithm, so the AVL library
> doesn't use it. It uses divide and conquer, but not quite as Adams
> describes it.

Please note that the original mail was about "IntMap" and "IntSet" and these data 
structures use patricia trees and have a much different union algorithm alltogether. 
(a very good one actually -- one of the main reasons to make a special Map and Set 
instance for integers :-) )

> But IMO the biggest problem with Data.Set is the inflexible API.
> For example..
> 
>>From Data.Set:
>  union :: Ord a => Set a -> Set a -> Set a
>  intersect :: Ord a => Set a -> Set a -> Set a
> 
>>From Data.Tree.AVL:
>  genUnion :: (e -> e -> COrdering e) -> AVL e -> AVL e -> AVL e
>  genIntersection :: (a -> b -> COrdering c) -> AVL a -> AVL b -> AVL c

You are right, but there is a reason for this: with the Set api I wanted to expose 
only operations that do not violate the notion of a mathematical set: ie. the 
ordering on the elements must be fixed or otherwise the invariants might not hold any 
longer. One way to insure this is by passing an ordering function when constructing a 
set, but a nicer way is to use overloading (since only a single instance can hold for 
the set).

Of course, maybe it is a good idea to make two modules: a 'collection' data type that 
exposes internal operations, and a 'Set' module on top that implements sets and does 
not expose operations that could violate set invariants.

You are right that these considerations do not make as much sense in the context of 
general AVL trees since no guarantee op "Set" functionality is given there. Same 
holds for the IntMap and Map interfaces. (but not for "Bag").

 > or is the main point that
 > the additional functionality could not have otherwise been provided in
 > an efficient way without going into the guts of Data.Set?

Efficiency wise, I think one should only provide functions that ensure that the 
complexity does not get worse -- if one also considers functionality that improves on 
a constant factor (like traversing twice) there is no end to the number of functions 
that can be provided (as one is basically doing deforestation by hand). But this is 
of course just my personal opinion on how to control the size of the API.

All the best,
-- Daan Leijen.

> Of course there's no reason why similar functions could not be
> provided by Data.Set, but they're not there at present.
> 
> 
>>or is the main point that
>>the additional functionality could not have otherwise been provided in
>>an efficient way without going into the guts of Data.Set?
> 
> 
> Yes. This why producing useable libraries like this is so difficult.
> There's plenty of reasonable things you just can't do efficiently
> with Data.Set. Same is probably true of Data.Tree.AVL of course, but
> I'm trying to make it more complete all the time.
> 
> Anyway, please try out AVL and let me know if there's anything more
> missing.
> 
> Regards
> --
> Adrian Hey
> 
> _______________________________________________
> Haskell mailing list
> Haskell at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell



More information about the Haskell mailing list