[Haskell-cafe] Re: Data.Binary poor read performance

Don Stewart dons at galois.com
Mon Feb 23 13:55:49 EST 2009


ndmitchell:
> Hi,
> 
> In an application I'm writing with Data.Binary I'm seeing very fast
> write performance (instant), but much slower read performance. Can you
> advise where I might be going wrong?

Can you try binary 0.5 , just released 20 mins ago?

There was definitely some slow downs due to inlining that I've mostly
fixed in this release.

  
> The data type I'm serialising is roughly: Map String [Either
> (String,[String]) [(String,Int)]]
> 
> A lot of the String's are likely to be identical, and the end file
> size is 1Mb. Time taken with ghc -O2 is 0.4 seconds.

Map serialisation was sub-optimal. That's been improved today's release.
  

> Various questions/thoughts I've had:
> 
> 1) Is reading a lot slower than writing by necessity?

Nope. Shouldn't be.
  
> 2) The storage for String seems to be raw strings, which is nice.
> Would I get a substantial speedup by moving to bytestrings instead of
> strings? If I hashed the strings and stored common ones in a hash
> table is it likely to be a big win?

Yep and maybe.
  
> 3) How long might you expect 1Mb to take to read?
> 
> Thanks for the library, its miles faster than the Read/Show I was
> using before - but I'm still hoping that reading 1Mb of data can be
> instant :-)

Tiny fractions of a second.

    $ cat A.hs
    import qualified Data.ByteString as B
    import System.Environment

    main = do
        [f] <- getArgs
        print . B.length =<< B.readFile f

    $ du -hs /usr/share/dict/cracklib-small  
    472K    /usr/share/dict/cracklib-small

    $ time ./A /usr/share/dict/cracklib-small  
    477023
    ./A /usr/share/dict/cracklib-small  0.00s user 0.01s system 122% cpu 0.005 total

If you're not seeing results like that, with binary 0.5, let's look deeper.

-- Don


More information about the Haskell-Cafe mailing list