Faster IntSet by using BitMaps in the lower branches

Sun Sep 18 22:31:43 CEST 2011

> Am Samstag, den 17.09.2011, 23:37 +0200 schrieb Milan Straka:
> > Great job!
> 
> thanks for the positive feedback.
> 
> > Could you please also generate comparison of and IntSet and DenseIntSet,
> > when the set is sparse (eg [1, 65 .. N])? I would like to see the time
> > complexities then. Also please include toList. If the results are fine,
> > I currently see no reason why not to push the code in.
> 
> I have attached some benchmarking result.

Thank you very much. I am quite surprised by the slowdown in the
intersection 100 case -- I would expect the times to be nearly identical for
the DenseIntSet and IntSet, as seen in the union case. Any ideas?

> While results are good for member, great for insert, intersection and
> union, toList is slower for sparse maps. toList is basically a foldr,
> so I think the culprit is this function:
> 
> foldrBits :: Int -> (Int -> a -> a) -> a -> Word -> a
> foldrBits shift f x = go shift
>   where STRICT_1_OF_2(go)
>         go bi 0 = x
>         go bi n | n `testBit` 0 = f bi (go (succ bi) (n `shiftRL` 1))
>                 | otherwise     =       go (succ bi) (n `shiftRL` 1)
> 
> I’ll try to optimize this function individually now, any suggestions are
> welcome.

We have to be sure there is no unneeded slowdown because of inefficient
Core code -- that all numbers are unpacked, succ is unpacked to primop,
a primop is used for the shift, and so on.

Also, you can use some additional bit tricks -- for example the
following heuristic, which skips last 6 bits if they are all zeroes:

>                  go bi 0 = x
> {- heuristic -}  go bi n | n .&. 0x3f == 0 = go (bi + 6) (n `shiftRL` 6)
>                  go bi n | n `testBit` 0 = f bi (go (succ bi) (n `shiftRL` 1))
>                          | otherwise     =       go (succ bi) (n `shiftRL` 1)

Of course, the 6 used can be changed to some other number - some
measurement should be made. I expected a square root of the number of
bits to behave reasonably well and 6 seemed to be a good compromise for
32 and 64.

Cheers,
Milan