[Haskell-cafe] ANNOUNCE: binary: high performance, pure binary serialisation

Donald Bruce Stewart dons at cse.unsw.edu.au
Thu Jan 25 21:51:01 EST 2007


        Binary: high performance, pure binary serialisation for Haskell
     ---------------------------------------------------------------------- 

The Binary Strike Team is pleased to announce the release of a new,
pure, efficient binary serialisation library for Haskell, now available
from Hackage:
    
 tarball:    http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2
 darcs:      darcs get http://darcs.haskell.org/binary
 haddocks:   http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

The 'binary' package provides efficient serialisation of Haskell values
to and from lazy ByteStrings. ByteStrings constructed this way may then
be written to disk, written to the network, or further processed (e.g.
stored in memory directly, or compressed in memory with zlib or bzlib).

Encoding and decoding are achieved by the functions:

    encode :: Binary a => a -> ByteString
    decode :: Binary a => ByteString -> a

which mirror the read/show functions. Convenience functions for serialising to
disk are also provided:

    encodeFile :: Binary a => FilePath -> a -> IO ()
    decodeFile :: Binary a => FilePath -> IO a

To serialise your Haskell data, all you need do is write an instance of
Binary for your type. For example, suppose in an interpreter we had the
data type:

    import Data.Binary
    import Control.Monad

    data Exp = IntE Int
             | OpE  String Exp Exp

We can serialise this to bytestring form with the following instance:

    instance Binary Exp where
        put (IntE i)          = putWord8 0 >> put i
        put (OpE s e1 e2)     = putWord8 1 >> put s >> put e1 >> put e2
        get = do tag <- getWord8
                 case tag of
                    0 -> liftM  IntE get
                    1 -> liftM3 OpE  get get get

The binary library has been heavily tuned for performance, particularly for
writing speed. Throughput of up to 160M/s has been achieved in practice, and in
general speed is on par or better than NewBinary, with the advantage of a pure
interface. Efforts are underway to improve performance still further. Plans are
also taking shape for a parser combinator library on top of binary, for bit
parsing and foreign structure parsing (e.g. network protocols).

Several projects are using binary already for serialisation:

    lambdabot   : state file serialisation
    hmp3        : mp3 file database
    hpaste.org  : pastes are stored in memory as compressed bytestrings, and
                  serialised to disk on MACID checkpoints

Binary was developed by a team of 8 during the Haskell Hackathon, Hac
07, and received 200+ commits over that period. You can see the commit graph
here:

    http://www.cse.unsw.edu.au/~dons/images/commits/community/binary-commits.png

The use of QuickCheck was critical to the rapid, safe development of the
library. The API was developed in conjunction with the QuickCheck properties
that checked the API for sanity. We were thus able to improve performance while
maintaining stability. We feel that QuickCheck should be an integral part of
the development strategy for all new Haskell libraries. Don't write code
without it!

Binary is portable, using the foreign function interface and cpp, and is
tested with Hugs and GHC.

Happy hacking!

The Binary Strike Team,

    Lennart Kolmodin
    Duncan Coutts
    Don Stewart
    Spencer Janssen
    David Himmelstrup
    Bjorn Bringert
    Ross Paterson
    Einar Karttunen



More information about the Haskell-Cafe mailing list