Bug in the proof of Data.FiniteMap and DData.Map
Daan Leijen
daanleijen at xs4all.nl
Tue Mar 23 12:04:44 EST 2004
On Fri, 19 Mar 2004 09:00:22 +0100, Robert Will <robertw at stud.tu-ilmenau.de> wrote:
> both of the data structures are based on Adams' balancing algorithm
> which contains a bug -- at least in its proof. (Perhaps it is
> correct, but I don't know anyone that knows if.)
I have been redoing proofs of Adam's algorithms too some years ago
when writing DData -- there were some errors but I think that calling
it "bugged" is rather harsh :-) Anyway, I think you are referring to
two separate cases (as far as I can remember though):
1) Given a constant "delta" (the maximal relative size differences between any
two nodes in the tree) and the "ratio" (the ratio between the outer and inner
sibling of the heavier subtree in a node that needs rebalancing). "delta" is
"w" in Adam's paper, and "ratio" the inverse of alpha. Now, we can prove that
delta must be larger than 4.65 with a ratio of 2, and we can prove that we
need at least a ratio of 2 to maintain the balance invariants. In his paper
though Adams uses an incorrect ratio of 1. However, both Data.FiniteMap and DData.Map
use the correct ratio.
2) For "join", and "merge", the proof brakes down too, but I remember that it suffices
to replace Adam's "<" comparisons with "<=" comparisons. This is why the balancing
assertion checks in "Data.FiniteMap" are turned off (as they fail), but this
relatively easy to fix. (DData.Map uses the correct comparison and works with
assertion checking on). Unfortunately, I couldn't find my notes on this issue
anymore, but I think that when you re-run the proofs with "<=", it will work out
correctly.
In general, I would like to stress that one tends to overrate the usefullness
of balancing -- Balancing is automatic for random data and balancing is only
needed to avoid some pathological worst-cases. Almost any choice of balancing
scheme will do. This is why the Data.FiniteMap structure has proven its use in
practice even when sometimes producing slightly unbalanced trees.
All the best,
Daan.
>
> Instead of fixing the bug I have re-derived the algorithm from the
> invariant together with a (now hopefully correct) proof. The new
> algorithm is contained in Dessy:
> http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/
> more specifically in the file:
> http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/WeightedTrees.hs
>
> The derivation, proof (and hint to Adam's bug) are in:
> http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/democratic/sequence.hs
>
> (The so-called hedge-union, -intersection, and -difference is also
> implemented: function 'x_merge_width' does all three:
> http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/Dessy.hs )
>
> A paper which presents the new (much simpler) algorithm and its proof
> putting it in scientific context is in preparation.
>
> Users that prefer proven algorithms over tested ones are free to use
> my implementation instead of the existing ones. (At moment, there is
> no proven and tested implementation. Of course this will change
> soon.)
>
> As for the maintenance of Data.FiniteMap and DData.Map, I don't know
> whether it is worth "fixing" the implementations or one would better
> replace them by a completely new design. Anyway, algorithms have no
> copyright and everyone is free to use the new "weight-balancer
> according to Will" to implement his own data structures. Oh, by the
> way, some of the other balancers in Dessy (or DData) might be more
> efficient than weight-based balancing anyway, I'll make some
> measurements later this year.
>
> good time,
> Bob
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
>
More information about the Libraries
mailing list