Pickling a finite map (Binary + zlib) [was: [Haskell-cafe] Data.Binary poor read performance]

Don Stewart dons at galois.com
Tue Feb 24 03:36:47 EST 2009


felipe.lessa:
> On Tue, Feb 24, 2009 at 4:59 AM, Don Stewart <dons at galois.com> wrote:
> > Looks like the Map reading/showing via association lists could do with
> > further work.
> >
> > Anyone want to dig around in the Map instance? (There's also some patches for
> > an alternative lazy Map serialisation, if people are keen to load maps -- happstack devs?).
> 
> From binary-0.5:
> 
> instance (Ord k, Binary k, Binary e) => Binary (Map.Map k e) where
>     put m = put (Map.size m) >> mapM_ put (Map.toAscList m)
>     get   = liftM Map.fromDistinctAscList get
> 
> instance Binary a => Binary [a] where
>     put l  = put (length l) >> mapM_ put l
>     get    = do n <- get :: Get Int
>                 replicateM n get
> 
> 
> 
> Can't get better, I think. Now, from containers-0.2.0.0:
> 
> fromDistinctAscList :: [(k,a)] -> Map k a
> fromDistinctAscList xs
>   = build const (length xs) xs
>   where
>     -- 1) use continutations so that we use heap space instead of stack space.
>     -- 2) special case for n==5 to build bushier trees.
>     build c 0 xs'  = c Tip xs'
>     build c 5 xs'  = case xs' of
>                        ((k1,x1):(k2,x2):(k3,x3):(k4,x4):(k5,x5):xx)
>                             -> c (bin k4 x4 (bin k2 x2 (singleton k1
> x1) (singleton k3 x3)) (singleton k5 x5)) xx
>                        _ -> error "fromDistinctAscList build"
>     build c n xs'  = seq nr $ build (buildR nr c) nl xs'
>                    where
>                      nl = n `div` 2
>                      nr = n - nl - 1
> 
>     buildR n c l ((k,x):ys) = build (buildB l k x c) n ys
>     buildR _ _ _ []         = error "fromDistinctAscList buildR []"
>     buildB l k x c r zs     = c (bin k x l r) zs
> 
> 
> The builds seem fine, but we spot a (length xs) on the beginning.
> Maybe this is the culprit? We already know the size of the map (it was
> serialized), so it is just a matter of exporting
> 
> fromDistinctAscSizedList :: Int -> [(k, a)] -> Map k a
> 
> Too bad 'Map' is exported as an abstract data type and it's not
> straighforward to test this conjecture. Any ideas?
> 

This idea was the motivation for the new Seq instance, which uses
internals to build quickly.

    Encoding to disk, the dictionary,

        $ time ./binary /usr/share/dict/cracklib-small
        "done"
        ./binary /usr/share/dict/cracklib-small  0.07s user 0.01s system 94% cpu 0.088 total

    Decoding,
        $ time ./binary dict.gz
        52848
        "done"
        ./binary dict.gz  0.07s user 0.01s system 97% cpu 0.079 total

    instance (Binary e) => Binary (Seq.Seq e) where
        put s = put (Seq.length s) >> Fold.mapM_ put s
        get = do n <- get :: Get Int
                 rep Seq.empty n get
          where rep xs 0 _ = return $! xs
                rep xs n g = xs `seq` n `seq` do
                               x <- g
                               rep (xs Seq.|> x) (n-1) g


Just a lot better. :)

So ... Data.Map, we're looking at you!

-- Don


More information about the Haskell-Cafe mailing list