Faster IntSet by using BitMaps in the lower branches

Sun Sep 18 23:06:51 CEST 2011

> Am Sonntag, den 18.09.2011, 22:13 +0200 schrieb Henning Thielemann:
> > On Sun, 18 Sep 2011, Joachim Breitner wrote:
> > > I have attached some benchmarking result. While results are good for
> > > member, great for insert, intersection and union, toList is slower for
> > > sparse maps. toList is basically a foldr, so I think the culprit is this
> > > function:
> > >
> > > foldrBits :: Int -> (Int -> a -> a) -> a -> Word -> a
> > > foldrBits shift f x = go shift
> > >  where STRICT_1_OF_2(go)
> > >        go bi 0 = x
> > >        go bi n | n `testBit` 0 = f bi (go (succ bi) (n `shiftRL` 1))
> > >                | otherwise     =       go (succ bi) (n `shiftRL` 1)
> > >
> > > I’ll try to optimize this function individually now, any suggestions are
> > > welcome.
> > 
> > You can certainly do some binary search by masking and comparing with bit 
> > patterns like
> >     1 `shiftL` 32 - 1 `shiftL` 16
> >     1 `shiftL` 16 - 1 `shiftL` 0
> 
> I’d like to avoid the binary search, as it is more expensive for dense
> sets. Milan’s suggestion of shifts by 6 might be a good compromise.
> Another approach might be to first use lowestBitSet to start with the
> lowest bit. In case of only one bit set, it will not iterate further
> then.

You could use lowestBitSet iteratively, but I am afraid it will be too
slow.

> 
> I tried adding some strictness annotation to go and see if that helps.
> According to the attachement, it does, instead of 4 times slower in the
> worst case (Size 4 million, step 100) it is only ~2,2x slower.
> 
> 
> What does "Str=DmdType U(L)U(L)m" in -fdump-stranal mean?

U(L) is the best -- the outer U means "unboxed". The (L) is kinda wrong
in this unboxed case -- just consider U(L) as U -- that is the way GHC
always dumps unboxed ints.

> Acccording to the core, GHCziPrim.uncheckedShiftRLzh is used, and also
> succ gets properly resolved to GHCziPrim.zpzh.

Shouldn't succ be resolved to +1? If I look correctly, you are
incrementing an Int. Aha -- succ is checking, whether the Int is 2^32-1.
You should probably use (bi+1) instead of (succ bi).

> Intersection is slower because the intersection of a Tip with a Bin
> cannot just be resolved by a single lookup.

You could improve the two Bin vs Tip cases. When you are in that case,
call a function 'lookupTip' which will lookup a corresponding Tip in the
larger tree (or returns Nil) and then do the .&.. Currently you are
doing unnecessary pattern matchings each time you call intersection.
(This is actually how the Data.IntSet.intersection works -- the Bin
vs Tip case calls member, which patter matches only the bigger tree.)

When you mimic the Data.IntSet.intersection more closely, you should get
nearly same complexity.

Cheers,
Milan

> Might this be related to the
> following and if yes, what I can do about it?
> 
> SpecConstr
>     Function `main:Data.DenseIntSet.intersection{v r1dK} [lidx]'
>       has four call patterns, but the limit is 3
>     Use -fspec-constr-count=n to set the bound
>     Use -dppr-debug to see specialisations
> 
> Greetings,
> Joachim
> 
> -- 
> Joachim "nomeata" Breitner
>   mail at joachim-breitner.de  |  nomeata at debian.org  |  GPG: 0x4743206C
>   xmpp: nomeata at joachim-breitner.de | http://www.joachim-breitner.de/
> 

> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries