Adding binary to the Haskell Platform
Neil Mitchell
ndmitchell at gmail.com
Wed Aug 5 09:40:54 EDT 2009
Hi
Since we're pointing out flaws in Data.Binary, I might as well add mine too :-)
I found that encode/decode of a String was massively slower than
serialising a String which I converted to a bytestring on the way in
and on the way out. They are exactly equivalent (at the binary
representation and the interface level), but String is clearly
inefficient. I also noticed that for String one way round was far
slower than the other, I think decoding was much slower, which was
surprising since encoding should have to do more work (length calls
etc)
Unfortunatley I got busy with work and never managed to write all the
details down, but this might be a good place to start benchmarking.
Thanks
Neil
On Wed, Aug 5, 2009 at 2:23 PM, Felipe Lessa<felipe.lessa at gmail.com> wrote:
> On Wed, Aug 05, 2009 at 06:48:02AM -0600, Denis Bueno wrote:
>> [...] This is because of the [] instance. I, myself, have been
>> bitten by this, but I don't think it's a flaw in the library -- there
>> are plenty of people using the library who haven't complained about
>> the instance.
>>
>> If the instance isn't right for you, write a new one. That's what I did.
>
> The problem is that people shouldn't be reinventing the wheel
> everytime. I propose adding something along the lines of this
> snippet to binary (untested):
>
>
> import Control.Applicative
> import Control.Monad (replicateM_)
>
> -- | Provides a new instance of 'Binary' to lists that 'put's and
> -- 'get's using chunks instead of forcing the whole spine of
> -- the list before writing the first byte. This enconding is
> -- less space-efficient in the disk, though, having an overhead
> -- of @1 + (length xs `div` 255)@ bytes instead of only four
> -- bytes (independently of list size), the overhead of the
> -- default instance.
> newtype Chunked a = Chunked {unChunked :: [a]}
>
> instance Binary (Chunked a) where
> -- This 'put' should be good enough.
> put = mapM_ putChunk . chunks 255 . unChunked
> where chunks 255 [] = [(0,[])] -- not []!
> chunks _ [] = []
> chunks _ xs = let (i,f) = splitAt 255 xs
> len = length i
> in (len,i) : chunks len f
> putChunk (len,xs) = putWord8 len >> mapM_ put xs
>
> -- I don't know if this get works nicely, though.
> get = getWord8 >>= go []
> where go acc 0 = return $ concat $ reverse acc
> go acc len = do xs <- replicateM_ len get
> len' <- if len == 255 then getWord8 else return 0
> go (xs:acc) len'
>
> --
> Felipe.
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
>
More information about the Libraries
mailing list