Bug in Data.Map

Milan Straka fox at ucw.cz
Wed Aug 4 05:33:15 EDT 2010


Hi,

As some of you know, I am also working on patches for the containers.
I actually know about two serious mistakes in the Adams' paper,
so the first think I did was set the delta in Map to four.

I am able to proof that for delta >= 5 the construction fails.
I also think I am able to proof that for delta = 4 the construction
works.

This does not immediately disqualify Data.Map, because, as in the
original paper, the rebalancing should be done after the balance is
broken. But see what the Data.Map (and also Data.Set) do:

balance :: k -> a -> Map k a -> Map k a -> Map k a
balance k x l r
  | sizeL + sizeR <= 1    = Bin sizeX k x l r
  | sizeR >= delta*sizeL  = rotateL k x l r                   (4)
  | sizeL >= delta*sizeR  = rotateR k x l r                   (5)
  | otherwise             = Bin sizeX k x l r
  where
    sizeL = size l
    sizeR = size r
    sizeX = sizeL + sizeR + 1

On the lines (4) and (5), you can see the rebalancing is done even in
the case of sizeR >= delta*sizeL.

If the rebalancing is done in the case of broken balance, ie. in
sizeR > delta*sizeL, there is a simple counterexample (you will need
fixed-width font):
   2                                                          (A)
1     5
     4 6
    3   7

and delete 1.

If the rebalancing is done as it is in Data.Map, the subtree in (A)
_should_ never be formed, because it _should_ be balanced away when
created (one subtree is delta*size of the other).

But consider this example (found using the TestMap.hs, I did not know
about it):

       2
   1             8
0           5         9
           4 6      10 11
          3   7

This is balanced. When deleting 0, we get
                8
   2                    9
1     5               10 11
     4 6
    3   7

which contains the subtree (A), so deleting 1 once again breaks the
balance.


If there is interest, I can formulate the proof for ratio = 2 and delta
= 4, which is used in Data.Set (and soon in Data.Map). I thought it
is not enough to publish (just to correct an error in someone else's
paper), so I did not bother doing it yet.

Cheers,
Milan Straka

> Hi all! Taylor Campbell found a bug in Data.Map but wasn't able to post to
> this mailing list so I'm submitting this on his behalf.
> 
> ----------------------------
> 
> Deletion in Data.Map does not always preserve the balance of a tree.
> Deletion in Data.Set appears to preserve the balance in some cases
> where Data.Map does not; I don't know whether Data.Set always leaves
> the tree balanced.  I have attached two files to test the balance of
> trees after a random sequence of insertions and a pathological
> sequence of deletions (deleteMin -- although this is not pathological
> for some applications such as priority queues).  TestMap.hs reliably
> demonstrates for me that deleteMin readily yields an unbalanced tree,
> if I supply a size of 10000 to test_map.  TestSet.hs does not.  The
> salient difference between Data.Set and Data.Map, I believe, is the
> choice of delta (w in Adams' paper), which is 4 in Data.Set and 5 in
> Data.Map.
> 
> Since Adams' paper claims that when alpha (i.e. 1/ratio) is 1/2, w
> must exceed about 4.646, this seems suspect.  But I haven't analyzed
> the reasoning in Adams' paper to find whether his conclusions are
> wrong.  What I do know is that I can break trees when w is 5, but I
> have not been able to break them when w is 4.  I also noticed a
> comment in Set.hs saying that a value of 4 for w improved performance
> on many operations.
> 
> I have also not been able to break trees if I change deletion to use
> join rather than balance.  But that slows everything down a good deal
> more, I believe, than simply changing w from 5 to 4.



> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries



More information about the Libraries mailing list