Faster IntSet by using BitMaps in the lower branches

Joachim Breitner mail at joachim-breitner.de
Mon Sep 19 10:42:27 CEST 2011


Hi Milan,

Am Sonntag, den 18.09.2011, 23:06 +0200 schrieb Milan Straka:
> > Intersection is slower because the intersection of a Tip with a Bin
> > cannot just be resolved by a single lookup.
> 
> You could improve the two Bin vs Tip cases. When you are in that case,
> call a function 'lookupTip' which will lookup a corresponding Tip in the
> larger tree (or returns Nil) and then do the .&.. Currently you are
> doing unnecessary pattern matchings each time you call intersection.
> (This is actually how the Data.IntSet.intersection works -- the Bin
> vs Tip case calls member, which patter matches only the bigger tree.)
> 
> When you mimic the Data.IntSet.intersection more closely, you should get
> nearly same complexity.

Changing the number of specializations did not help, so I implemented
the loookupTip (here called intersectBM, for consistency with similar
helpers). If you compare comparison-3-spec-patterns.pdf (the base case)
with comparison-intersectBM.pdf, it shows that it is even slower than my
code, as follows:

-- The intersection of one tip with a map
intersectBM :: Prefix -> BitMap -> IntSet -> IntSet
intersectBM kx bm (Bin p2 m2 l2 r2) 
  | nomatch kx p2 m2 = Nil
  | zero kx m2       = intersectBM kx bm l2
  | otherwise        = intersectBM kx bm r2
intersectBM kx bm (Tip kx' bm') 
  | kx == kx' = tip kx (bm .&. bm')
  | otherwise  = Nil
intersectBM kx bm Nil = Nil 


Inlining did not help much.

Floating the constant parameters and having an explicit "go" function
die not change anything, I guess I can rely on ghc to do that for me.

Adding strictness helps a bit, so that for up to medium sized maps
performance is equal, even for sparse maps. Not sure what goes wrong for
large sets.

BTW, note the great performance boost for very dense maps when it comes
to intersection. It’s barely readable any more :-)


I have attached the full statistics for the current version
(comparison.pdf). The solid line is IntMap, the long dashed line is
DenseIntMap and the short dashed line (left axis) is the speed-up in
percent, so lower is better.

Greetings,
Joachim

-- 
Joachim "nomeata" Breitner
  mail at joachim-breitner.de  |  nomeata at debian.org  |  GPG: 0x4743206C
  xmpp: nomeata at joachim-breitner.de | http://www.joachim-breitner.de/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: comparison-3-spec-patterns.pdf
Type: application/pdf
Size: 13405 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/libraries/attachments/20110919/6f424727/attachment-0004.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: comparison-intersectBM.pdf
Type: application/pdf
Size: 13406 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/libraries/attachments/20110919/6f424727/attachment-0005.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: comparison-intersectBM-strict.pdf
Type: application/pdf
Size: 13401 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/libraries/attachments/20110919/6f424727/attachment-0006.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: comparison.pdf
Type: application/pdf
Size: 23503 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/libraries/attachments/20110919/6f424727/attachment-0007.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://www.haskell.org/pipermail/libraries/attachments/20110919/6f424727/attachment-0001.pgp>


More information about the Libraries mailing list