Proposal: add laziness to Data.Map / IntMap

Scott Dillard sedillard at ucdavis.edu
Tue Aug 5 14:46:50 EDT 2008


>Adrian Hey wrote:
>
> Using explicit seqs rather than strict data types is actually faster,
> for reasons that are a bit of a mystery to me. I'm not sure what cost
> Bertram is talking about, but AFAIK ghc uses the same info pointer
> mechanism for all heap records, including unevaluated thunks (although
> the info pointers will point to different things of course). But the
> cost of pattern matching on *evaluated* AVL nodes should be independent
> of strictness annotation AFAICS.

Thanks for chiming in Adrian. Just to get started I removed the strictness
annotations from the Data.Map Bin constructor, made no other changes, and
ran a
silly benchmark (at the end of this email). The version without bangs is
actually faster than the version currently shipping. I get about 10.5 sec
for
the lazy version and 11.5 sec for the strict version (2.1Ghz Intel Core)

I'll repeat that in bold for people just skimming this thread:

__Removing Strictness Annotations Makes It Go Faster__

The reason I think is that the helper functions bin, join and balance
already
provide just enough strictness, as they need to inspect the size field. The
strictness analyzer can then do its job. The case for IntMap is tricker,
as there is no implicit strictness in the code so removing the bangs causes
stack overflows. Still working on that one.

Here are the benchmarks. The lazy version also evaluates "keySum dmap"
slightly
faster (repeated inserts) and its a tie for "keySum smap" (sequential
inserts).
I admit this benchmark is goofy, if you have a better one please share.

Scott



import qualified Data.Map as Map
import Data.List as List

n = 1000000
rkeys = [ (i*122789) `mod` 1006471 | i<-[0..] ] :: [Int]
dkeys = map (`div`1000) rkeys :: [Int]
skeys = [0..] :: [Int]

shuffle (a:b:c:d:e:f:g:h:rest) = e:a:h:d:c:b:g:f: shuffle rest

keySum = List.foldl' (+) 0 . Map.keys

rmap = Map.fromList    . take n . shuffle $ rkeys `zip` [0..]
dmap = Map.fromList    . take n . shuffle $ dkeys `zip` [0..]
smap = Map.fromAscList . take n . shuffle $ skeys `zip` [0..]

mix (x:xs) (y:ys) = x : y : mix xs ys
mix _ [] = []
mix [] _ = []

rkeys2 = [ (i*122789) `mod` 1006471 | i<-[0..] ] :: [Int]
rlooks = [ (i*122819) `mod` 1006471 | i<-[0..] ] :: [Int]

rlook =
  List.foldl'
    (\k s -> case Map.lookup k rmap of Nothing -> s; Just x -> s+x)
    0 (take n $ rkeys2 `mix` drop 1000 rlooks)

main = print rlook  -- or print (keySum dmap) or whatever
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/libraries/attachments/20080805/af28bfdc/attachment-0001.htm


More information about the Libraries mailing list