Proposal: Further performance improvements of Data.Map

Milan Straka fox at ucw.cz
Thu Sep 23 08:49:44 EDT 2010


Hello,

> Hello Milan,
> 
> Thank you for your good job. I have several suggestions and questions.
> 
> * Suggestion
> 
> --
> That leaves two reasonable variants, delta=3 and delta=4,
> both with ratio=2.
> --
> 
> According my tests, integer solutions is (3,2) and (4,2) ONLY. But
> this comments can be read as if there are other integer solutions with
> ratio other than 2. What about saying that (3,2) and (4,2) are only
> integer solutions? (i.e. Valid integer value of ratio is 2 only)

I did not try to play with other ratios than 2, so I did not know they
did not work.

> 
> * Question
> 
> Did you prove (3,2) and (4,2) are valid mathematically? As you know,
> tests cannot prove absence of bugs.
> 
> Unfortunately, we (I and Mr. Hirai) cannot give mathematically proof
> of Adams's scheme where weight = size. This is mainly because Adams's
> scheme treats small trees are special and it makes mathematical
> analysis difficult.
> 
> As I said to you personally, we (I and Mr. Hirai) are working of the
> proof of Nievergelt and Reingold scheme by Coq. Because of weight =
> size + 1, there is not special condition and mathmatical analysis is
> much easier. In N&R's scheme, there is only one integer solution:
> (3,2).

I can prove that (3,2) and (4,2) are mathematically valid, I will post
the proof soon.

> 
> * Question
> 
> --
> In the benchmarks, delta=3 is faster on insert operations,
> but delta=4 has better overall performance, so we use delta=4.
> --
> 
> Smaller delta makes tree more balanced. So, intuitively delta=3 should
> be faster on lookup operations but slower on insert and delete
> operations. Do you have any idea why your result is against the
> intuition?

Actually, the height of the trees depends much on input data, not only
on delta. The lookups are nearly identical with sorted and random data.

I did more measurements and in the end I think delta = 3 is better. It
is faster on inserts than delta = 4 and a bit slower on deletes,
and the other operations are nearly identical.

> 
> * Question
> 
> --
> Note: in contrast to Adam's paper, we perform the rebalance
> even in the case when (size left == delta * size right), instead
> when (size left > delta * size) as in the paper. Both are correct,
> but the former is slightly faster overall.
> --
> 
> If we perform rebalance for size left == delta * size right,
> intuitively insert and delete operations should be lower but lookup
> operations should be faster. Do you have any idea why your result is
> against the intuition?
> 
> Personally I don't like rebalance for size left == delta * size right
> because it is against the definition of *f-balance*.

I agree, Currently the patch rebalances only if size left > delta * size
right.

> 
> * Suggestion
> 
> Both the original balance function and your balance function have
> unnecessary condition check. Suppose we are inserting an element to a
> right subtree. In this case, only left rotation would happen. But we
> call the balance function and choose left rotation or right rotation.
> 
> If we divide balance into balanceL and balanceR, we can remove this
> unnecessary condition and clarify the code.
> 
> Since the alter function requires the balance function, we cannot
> remove the balance function. So, we should provide balance, balanceL,
> and balanceR.

Thanks, a good idea. I did that and the performance increased.
I mentioned your name in the patch description.

Thanks,
Milan


More information about the Libraries mailing list