Pickling a finite map (Binary + zlib) [was: [Haskell-cafe]
Data.Binary poor read performance]
Don Stewart
dons at galois.com
Tue Feb 24 03:36:47 EST 2009
felipe.lessa:
> On Tue, Feb 24, 2009 at 4:59 AM, Don Stewart <dons at galois.com> wrote:
> > Looks like the Map reading/showing via association lists could do with
> > further work.
> >
> > Anyone want to dig around in the Map instance? (There's also some patches for
> > an alternative lazy Map serialisation, if people are keen to load maps -- happstack devs?).
>
> From binary-0.5:
>
> instance (Ord k, Binary k, Binary e) => Binary (Map.Map k e) where
> put m = put (Map.size m) >> mapM_ put (Map.toAscList m)
> get = liftM Map.fromDistinctAscList get
>
> instance Binary a => Binary [a] where
> put l = put (length l) >> mapM_ put l
> get = do n <- get :: Get Int
> replicateM n get
>
>
>
> Can't get better, I think. Now, from containers-0.2.0.0:
>
> fromDistinctAscList :: [(k,a)] -> Map k a
> fromDistinctAscList xs
> = build const (length xs) xs
> where
> -- 1) use continutations so that we use heap space instead of stack space.
> -- 2) special case for n==5 to build bushier trees.
> build c 0 xs' = c Tip xs'
> build c 5 xs' = case xs' of
> ((k1,x1):(k2,x2):(k3,x3):(k4,x4):(k5,x5):xx)
> -> c (bin k4 x4 (bin k2 x2 (singleton k1
> x1) (singleton k3 x3)) (singleton k5 x5)) xx
> _ -> error "fromDistinctAscList build"
> build c n xs' = seq nr $ build (buildR nr c) nl xs'
> where
> nl = n `div` 2
> nr = n - nl - 1
>
> buildR n c l ((k,x):ys) = build (buildB l k x c) n ys
> buildR _ _ _ [] = error "fromDistinctAscList buildR []"
> buildB l k x c r zs = c (bin k x l r) zs
>
>
> The builds seem fine, but we spot a (length xs) on the beginning.
> Maybe this is the culprit? We already know the size of the map (it was
> serialized), so it is just a matter of exporting
>
> fromDistinctAscSizedList :: Int -> [(k, a)] -> Map k a
>
> Too bad 'Map' is exported as an abstract data type and it's not
> straighforward to test this conjecture. Any ideas?
>
This idea was the motivation for the new Seq instance, which uses
internals to build quickly.
Encoding to disk, the dictionary,
$ time ./binary /usr/share/dict/cracklib-small
"done"
./binary /usr/share/dict/cracklib-small 0.07s user 0.01s system 94% cpu 0.088 total
Decoding,
$ time ./binary dict.gz
52848
"done"
./binary dict.gz 0.07s user 0.01s system 97% cpu 0.079 total
instance (Binary e) => Binary (Seq.Seq e) where
put s = put (Seq.length s) >> Fold.mapM_ put s
get = do n <- get :: Get Int
rep Seq.empty n get
where rep xs 0 _ = return $! xs
rep xs n g = xs `seq` n `seq` do
x <- g
rep (xs Seq.|> x) (n-1) g
Just a lot better. :)
So ... Data.Map, we're looking at you!
-- Don
More information about the Haskell-Cafe
mailing list