Pickling a finite map (Binary + zlib) [was: [Haskell-cafe] Data.Binary poor read performance]

Paulo Tanimoto tanimoto at arizona.edu
Tue Feb 24 12:05:44 EST 2009


Hello,

On Tue, Feb 24, 2009 at 2:36 AM, Don Stewart <dons at galois.com> wrote:
> This idea was the motivation for the new Seq instance, which uses
> internals to build quickly.
>
>    Encoding to disk, the dictionary,
>
>        $ time ./binary /usr/share/dict/cracklib-small
>        "done"
>        ./binary /usr/share/dict/cracklib-small  0.07s user 0.01s system 94% cpu 0.088 total
>
>    Decoding,
>        $ time ./binary dict.gz
>        52848
>        "done"
>        ./binary dict.gz  0.07s user 0.01s system 97% cpu 0.079 total
>
>    instance (Binary e) => Binary (Seq.Seq e) where
>        put s = put (Seq.length s) >> Fold.mapM_ put s
>        get = do n <- get :: Get Int
>                 rep Seq.empty n get
>          where rep xs 0 _ = return $! xs
>                rep xs n g = xs `seq` n `seq` do
>                               x <- g
>                               rep (xs Seq.|> x) (n-1) g
>
>
> Just a lot better. :)
>
> So ... Data.Map, we're looking at you!

Indeed, that was the motivation for writing the patch for Seq.  [Ross,
thank you again for the help.]  I had performance issues with lists,
but noticed that switching to Sequence wasn't helping at all.  This
new definition takes advantage of the features that Seq has and List
doesn't.

Regarding Map, I like Felipe's idea of having a separate Internal.

I know that Lemmih has a few other implementations of Map (compactMap,
BerkeleyDB).  If I remember correctly, he made BerkeleyDB an instance
of Binary.  That can probably give us some insight too.

Paulo


More information about the Haskell-Cafe mailing list