Adding binary to the Haskell Platform

Neil Mitchell ndmitchell at gmail.com
Wed Aug 5 09:40:54 EDT 2009


Hi

Since we're pointing out flaws in Data.Binary, I might as well add mine too :-)

I found that encode/decode of a String was massively slower than
serialising a String which I converted to a bytestring on the way in
and on the way out. They are exactly equivalent (at the binary
representation and the interface level), but String is clearly
inefficient. I also noticed that for String one way round was far
slower than the other, I think decoding was much slower, which was
surprising since encoding should have to do more work (length calls
etc)

Unfortunatley I got busy with work and never managed to write all the
details down, but this might be a good place to start benchmarking.

Thanks

Neil

On Wed, Aug 5, 2009 at 2:23 PM, Felipe Lessa<felipe.lessa at gmail.com> wrote:
> On Wed, Aug 05, 2009 at 06:48:02AM -0600, Denis Bueno wrote:
>> [...] This is because of the [] instance.  I, myself, have been
>> bitten by this, but I don't think it's a flaw in the library -- there
>> are plenty of people using the library who haven't complained about
>> the instance.
>>
>> If the instance isn't right for you, write a new one.  That's what I did.
>
> The problem is that people shouldn't be reinventing the wheel
> everytime.  I propose adding something along the lines of this
> snippet to binary (untested):
>
>
> import Control.Applicative
> import Control.Monad (replicateM_)
>
> -- | Provides a new instance of 'Binary' to lists that 'put's and
> --   'get's using chunks instead of forcing the whole spine of
> --   the list before writing the first byte.  This enconding is
> --   less space-efficient in the disk, though, having an overhead
> --   of @1 + (length xs `div` 255)@ bytes instead of only four
> --   bytes (independently of list size), the overhead of the
> --   default instance.
> newtype Chunked a = Chunked {unChunked :: [a]}
>
> instance Binary (Chunked a) where
>    -- This 'put' should be good enough.
>    put = mapM_ putChunk . chunks 255 . unChunked
>        where chunks 255 [] = [(0,[])] -- not []!
>              chunks _   [] = []
>              chunks _   xs = let (i,f) = splitAt 255 xs
>                                  len   = length i
>                              in (len,i) : chunks len f
>              putChunk (len,xs) = putWord8 len >> mapM_ put xs
>
>    -- I don't know if this get works nicely, though.
>    get = getWord8 >>= go []
>        where go acc 0   = return $ concat $ reverse acc
>              go acc len = do xs   <- replicateM_ len get
>                              len' <- if len == 255 then getWord8 else return 0
>                              go (xs:acc) len'
>
> --
> Felipe.
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
>


More information about the Libraries mailing list