Pickling a finite map (Binary + zlib) [was: [Haskell-cafe]
Data.Binary poor read performance]
bertram.felgenhauer at googlemail.com
Tue Feb 24 12:19:09 EST 2009
Felipe Lessa wrote:
> On Tue, Feb 24, 2009 at 4:59 AM, Don Stewart <dons at galois.com> wrote:
> > Looks like the Map reading/showing via association lists could do with
> > further work.
> > Anyone want to dig around in the Map instance? (There's also some patches for
> > an alternative lazy Map serialisation, if people are keen to load maps -- happstack devs?).
> >>From binary-0.5:
> instance (Ord k, Binary k, Binary e) => Binary (Map.Map k e) where
> put m = put (Map.size m) >> mapM_ put (Map.toAscList m)
> get = liftM Map.fromDistinctAscList get
> instance Binary a => Binary [a] where
> put l = put (length l) >> mapM_ put l
> get = do n <- get :: Get Int
> replicateM n get
> Can't get better, I think.
We can improve it slightly (about 20% runtime in dons example [*]):
instance (Ord k, Binary k, Binary e) => Binary (Map.Map k e) where
get = liftM (Map.fromDistinctAscList . map strictValue) get where
strictValue (k,v) = (v `seq` k, v)
The point is that Data.Map.Map is strict in the keys, but not in the
values of the map. In the case of deserialisation this means the values
will be thunks that hang on to the Daya.Binary buffer.
> Now, from containers-0.2.0.0:
> fromDistinctAscList :: [(k,a)] -> Map k a
> fromDistinctAscList xs
> = build const (length xs) xs
> -- 1) use continutations so that we use heap space instead of stack space.
> -- 2) special case for n==5 to build bushier trees.
> build c 0 xs' = c Tip xs'
> build c 5 xs' = case xs' of
> -> c (bin k4 x4 (bin k2 x2 (singleton k1
> x1) (singleton k3 x3)) (singleton k5 x5)) xx
> _ -> error "fromDistinctAscList build"
> build c n xs' = seq nr $ build (buildR nr c) nl xs'
> nl = n `div` 2
> nr = n - nl - 1
> buildR n c l ((k,x):ys) = build (buildB l k x c) n ys
> buildR _ _ _  = error "fromDistinctAscList buildR "
> buildB l k x c r zs = c (bin k x l r) zs
> The builds seem fine, but we spot a (length xs) on the beginning.
> Maybe this is the culprit? We already know the size of the map (it was
> serialized), so it is just a matter of exporting
> fromDistinctAscSizedList :: Int -> [(k, a)] -> Map k a
Eliminating the 'length' call helps, too, improving runtime by
another about 5%.
The result is still a factor of 1.7 slower than reading the list of
[*] Notes on timings:
1) I used `rnf` for all timings, as in my previous mail.
2) I noticed that in my previous measurements, the GC time for the
Data.Map tests was excessively large (70% and more), so I used
+RTS -H32M this time. This resulted in a significant runtime
improvement of about 30%.
3) Do your own measurements! Some code to play with is available here:
More information about the Haskell-Cafe