Improving containers library

Wed Mar 3 10:28:44 EST 2010

Hi Milan,

On 03.03.2010, at 15:23, Milan Straka wrote:

> Hi all,
>
> I have started an internship in Cambridge and will spend 3 months on
> Haskell. One of the possible projects is improving the containers
> library. Things I am thinking about:
> - measuring the performance of existing implementations (notably Map  
> and
>  Set) and possibly improve them (either without or with API changes),
> - adding Queue and a PriorityQueue,
> - maybe adding a generalized trie,
> - maybe adding a hashtable (more like a trie of the hashes,  
> something in
>  the line of Ideal hash trees; and maybe a 'classic' hash modifiable  
> in
>  the ST monad?)
>
> I would be grateful for any comments, thoughts or wishes.

In the context of program analysis, one tracks a lot of information  
about the state of a program at a given program point. This state is  
propagated around loops and conditionals. The implicit sharing of  
nodes in functional data structures are very well suited for this.

However, whenever two states need to be joined at the loop head or at  
the end of a conditional, the two trees that represent the state need  
to be joined by performing some sort of join operation on each element  
of a tree. For all sensible domains a `join` a = a. Given that a loop  
or branch of a conditional usually touches only few variables, it is  
prudent not to perform the join operation on all elements of the tree.  
Instead, a tree-difference operation is required that traverses the  
two trees to be joined and calculates the difference between them. In  
particular, whenever two references are, in fact, the same pointer,  
the subtree does not need to be considered. This way, a join between  
two trees of size n reduces to joining only m elements where m is the  
maximum number of elements in each tree. This is a tremendous win for  
n >> m.

I've implemented such a difference operation on a home-grown splay  
tree implementation. I would like to see this on standard AVL trees.  
Maybe other people could indicate if such a difference operation on  
trees would be useful in their applications.

The difference operation in my implementation has the following  
signature:

difference :: (Ord k) => (a -> a -> Bool) -> Map k a -> Map k a ->  
([k], [k])
Calculate the difference between the two trees. Returns the elements  
that were in the first but not in the second and the elements that  
were in the second but not in the first. Both lists are unsorted. In  
case two element have the same key the given equality predicate is a  
applied to them. If the predicate returns False the elements are added  
to their respective difference lists. The contents of both trees  
remain unchanged.
differenceElems :: (Ord k) => (a -> a -> Bool) -> Map k a -> Map k a - 
 > ([a], [a])
As difference, but extracts the elements, not their keys.

Cheers,
Axel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/libraries/attachments/20100303/bb722f09/attachment.html