Bug in the proof of Data.FiniteMap and DData.Map

Fri Mar 26 09:46:53 EST 2004

On Tue, 23 Mar 2004, Daan Leijen wrote:
>
> 1) However, both Data.FiniteMap and DData.Map use the correct ratio.

I have seen this stated in your code, but that's not the bug that I have
described.  (Message attached again.)

> 2) replace Adam's "<" comparisons with "<=" comparisons.

That's not the "my" bug, neither.

> I have been redoing proofs of Adam's algorithms too some years ago
> when writing DData -- there were some errors but I think that calling
> it "bugged" is rather harsh :-)

The precondition-problem is not easy to fix.  However, I ackknowledge that
your balancing-checks are really good evidence for the correctness of the
algorithm.  (I didn't notice them first, because you checked via an
external function, not in a smart constructor.  And I looked only there.)

Well, having mathematically unproven code is a common thing and should
indeed be no problem.  It's just that I personnally prefer algorithms that
I understand, i.e. I know why they are correct.  For Adam's algorithm this
is not the case, and I think other people would have serious problems,
too, when explaining "under what (pre)condition does 'balance' produce a
correctly balanced tree?" and "why does 'join' always observe that
precondition?".

> In general, I would like to stress that one tends to overrate the
> usefullness of balancing -- Balancing is automatic for random data and
> balancing is only needed to avoid some pathological worst-cases.
> Almost any choice of balancing scheme will do.

Right, and I also see worst-case bounds rather as some kind of assuration,
not a direct efficiency booster.  However, worst-case bounds also allow
the use of simple operations in unforeseen situations, they make them more
generally useable.  For example (although ordering is not balancing), one
usually provides a separate "fromOrderedList" operation to create ordered
structures.  Algorithms with good worst-case performance can do that job,
too, without special functions (also easier for the user), and even more,
they work on "almost ordered" lists, too.

In general, when using unbalanced collections (and structures without good
worst-case bounds in general), one has to be _more careful_.

Robert

On Tue, 23 Mar 2004, Robert Will wrote:
>
> To prove the correctness of the algorithm, by the way.... it suffices to
> give a precondition for 'balance' (T' in Adams' paper) and prove that
> 'merge' (Adams' 'concat3') observes that precondition.  That's the bug in
> Adams' proof.  Adams proved 'balanced' correct for the precondition that
> that the balancing invariant is violated by at most one.  (size left <=
> size right + 1 and v.v.)  This is observed by any other call to 'balance',
> but not (?) from 'merge'.
>
> > The current algorithm looks good enough for me,
>
> Yeah, in DData it looks better than FiniteMap where they just disabled the
> failing assertions.  Oh, perhaps the algorithm is wrong with respect to
> the simple balancing invariant, but correct with respect to some more
> complicated (amortised?) invariant.  That would explain Adams' failure
> (and my reluctance) to proof it _and_ its good behaviour in practice.
>
> > As a side note, I've re-implemented the "SpellCheck"
> > test with DData, and it behaves ok.
>
> As it does with many other applications...