[Haskell-cafe] Re: ANNOUNCE: binary: high performance, pure binary serialisation

Donald Bruce Stewart dons at cse.unsw.edu.au
Tue Jan 30 19:13:07 EST 2007


simonmar:
> Donald Bruce Stewart wrote:
> >        Binary: high performance, pure binary serialisation for Haskell
> >     ---------------------------------------------------------------------- 
> >The Binary Strike Team is pleased to announce the release of a new,
> >pure, efficient binary serialisation library for Haskell, now available
> >from Hackage:
> >    
> > tarball:    
> > http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2
> > darcs:      darcs get http://darcs.haskell.org/binary
> > haddocks:   http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html
> 
> A little benchmark I had lying around shows that this Binary library beats 
> the one in GHC by a factor of 2 (at least on this example):

Very nice. We've been benchmarking again NewBinary, for various
Word-sized operations, with the following results, on x86:

NewBinary, fairly tuned (lots of fastMutInt#s)

10MB of Word8  in chunks of 1: 10.68MB/s write, 9.16MB/s read
10MB of Word16 in chunks of 16: 7.89MB/s write, 6.65MB/s read
10MB of Word32 in chunks of 16: 7.99MB/s write, 7.29MB/s read
10MB of Word64 in chunks of 16: 5.10MB/s write, 5.75MB/s read

Data.Binary:

10MB of Word8  in chunks of  1 (  Host endian):   11.7 MB/s write, 2.4 MB/s read
10MB of Word16 in chunks of 16 (  Host endian):   89.3 MB/s write, 3.6 MB/s read
10MB of Word16 in chunks of 16 (   Big endian):   83.3 MB/s write, 1.6 MB/s read
10MB of Word32 in chunks of 16 (  Host endian):  178.6 MB/s write, 7.2 MB/s read
10MB of Word32 in chunks of 16 (   Big endian):  156.2 MB/s write, 2.5 MB/s read
10MB of Word64 in chunks of 16 (  Host endian):   78.1 MB/s write, 11.3 MB/s read
10MB of Word64 in chunks of 16 (   Big endian):   44.6 MB/s write, 2.8 MB/s read

Note that we're much faster writing, in general, but read speed lags.
The 'get' monad hasn't received much attention yet, though we know what
needs tuning.

> GHC's binary library (quite heavily tuned by me):
> 
> Write time:   2.41
> Read time:    1.44
> 1,312,100,072 bytes allocated in the heap
>      96,792 bytes copied during GC (scavenged)
>     744,752 bytes copied during GC (not scavenged)
>  32,492,592 bytes maximum residency (6 sample(s))
> 
>        2384 collections in generation 0 (  0.01s)
>           6 collections in generation 1 (  0.00s)
> 
>          63 Mb total memory in use
> 
>   INIT  time    0.00s  (  0.00s elapsed)
>   MUT   time    3.78s  (  3.84s elapsed)
>   GC    time    0.02s  (  0.02s elapsed)
>   EXIT  time    0.00s  (  0.00s elapsed)
>   Total time    3.79s  (  3.86s elapsed)
> 
> Data.Binary:
> 
> Write time:   0.99
> Read time:    0.65
> 1,949,205,456 bytes allocated in the heap
> 204,986,944 bytes copied during GC (scavenged)
>   5,154,600 bytes copied during GC (not scavenged)
>  70,247,720 bytes maximum residency (8 sample(s))
> 
>        3676 collections in generation 0 (  0.25s)
>           8 collections in generation 1 (  0.19s)
> 
>         115 Mb total memory in use
> 
>   INIT  time    0.00s  (  0.00s elapsed)
>   MUT   time    1.08s  (  1.13s elapsed)
>   GC    time    0.44s  (  0.52s elapsed)
>   EXIT  time    0.00s  (  0.00s elapsed)
>   Total time    1.51s  (  1.65s elapsed)
> 
> This example writes a lot of 'Maybe Int' values.  I'm surprised by the 
> extra heap used by Data.Binary: this was on a 64-bit machine, so Ints 
> should have been encoded as 64 bits by both libraries.  Also, the GC seems 
> to be working quite hard with Data.Binary, I'd be interested to know why 
> that is.

Very interesting! Is this benchmark online?

I'm a little surprised by the read times, reading is still fairly
unoptimised compared to writing.

> Anyway, this result is good enough for me, I'd like to use Data.Binary in 
> GHC as soon as we can.  Unfortunately we have to support older compilers, 
> so there will be some build-system issues to surmount.  Also we need a way 
> to pass state around while serialising/deserialising - what's the current 
> plan for this?

The plan was to use StateT Put or StateT Get, I think. But we don't have
a demo for this yet. Duncan, Lennart, any suggestions?

-- Don


More information about the Haskell-Cafe mailing list