Proposal: Further performance improvements of Data.Map

Kazu Yamamoto ( 山本和彦 ) kazu at iij.ad.jp
Tue Sep 14 22:14:25 EDT 2010


Hello Milan,

Thank you for your good job. I have several suggestions and questions.

* Suggestion

--
That leaves two reasonable variants, delta=3 and delta=4,
both with ratio=2.
--

According my tests, integer solutions is (3,2) and (4,2) ONLY. But
this comments can be read as if there are other integer solutions with
ratio other than 2. What about saying that (3,2) and (4,2) are only
integer solutions? (i.e. Valid integer value of ratio is 2 only)

* Question

Did you prove (3,2) and (4,2) are valid mathematically? As you know,
tests cannot prove absence of bugs.

Unfortunately, we (I and Mr. Hirai) cannot give mathematically proof
of Adams's scheme where weight = size. This is mainly because Adams's
scheme treats small trees are special and it makes mathematical
analysis difficult.

As I said to you personally, we (I and Mr. Hirai) are working of the
proof of Nievergelt and Reingold scheme by Coq. Because of weight =
size + 1, there is not special condition and mathmatical analysis is
much easier. In N&R's scheme, there is only one integer solution:
(3,2).

* Question

--
In the benchmarks, delta=3 is faster on insert operations,
but delta=4 has better overall performance, so we use delta=4.
--

Smaller delta makes tree more balanced. So, intuitively delta=3 should
be faster on lookup operations but slower on insert and delete
operations. Do you have any idea why your result is against the
intuition?

* Question

--
Note: in contrast to Adam's paper, we perform the rebalance
even in the case when (size left == delta * size right), instead
when (size left > delta * size) as in the paper. Both are correct,
but the former is slightly faster overall.
--

If we perform rebalance for size left == delta * size right,
intuitively insert and delete operations should be lower but lookup
operations should be faster. Do you have any idea why your result is
against the intuition?

Personally I don't like rebalance for size left == delta * size right
because it is against the definition of *f-balance*.

* Suggestion

Both the original balance function and your balance function have
unnecessary condition check. Suppose we are inserting an element to a
right subtree. In this case, only left rotation would happen. But we
call the balance function and choose left rotation or right rotation.

If we divide balance into balanceL and balanceR, we can remove this
unnecessary condition and clarify the code.

Since the alter function requires the balance function, we cannot
remove the balance function. So, we should provide balance, balanceL,
and balanceR.

Regards,

--Kazu

> see http://hackage.haskell.org/trac/ghc/ticket/4311.
> 
> The following text is the description of ticket 4311:
> 
> This proposal depends on #4277 and #4278.
> 
> This proposal further improves the performance of Data.Map. It consists of four patches:
> 
>    1. improvements to union and difference implementation( by eliminating high-order function)
>    2. improvements to balance function which is used by nearly all methods modifying a map (making one monolithic function which allows perform only one pattern-match on the tree nodes)
>    3. correction of the test (the Arbitrary instance could generate unbalanced trees, which the patch 2. discovered)
>    4. added benchmark of union, difference and intersection methods
> 
> On i386 Intel Core2 and GHC 6.12.1, the improvements are
> 
> alter                       5.61
> delete                      10.80
> difference                  20.33
> insert                      8.09
> intersection                7.69
> union                       21.74
> update                      7.96
> 
> The repository of the containers package with these patches (and also several others) is at  http://fox.auryn.cz/darcs/containers/.
> 
> The patches are also attached.
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries


More information about the Libraries mailing list