[Haskell-cafe] Yet another binary reader (10x faster than Hackage's binary; )

Clinton Mead clintonmead at gmail.com
Tue Mar 19 23:55:40 UTC 2019

Sounds fascinating.

I don't currently have a strong need for a fast binary serialiser but your
explorations inside the GHC optimiser sound very interesting.

If you can get around to doing a blog/presentation on your explorations and
techniques I'll sure be interested and I'm sure would be many in the
Haskell community.

On Wed, Mar 20, 2019 at 8:58 AM Станислав Черничкин <schernichkin at gmail.com>

> I was investigating Haskell’s binary serialization libraries and has found
> that lot of them (binary, cereal and protocol-buffers) has pretty similar
> design. They capable to consume lazy bytestrings and either produce
> result, report error or produce partial result, indicating that more input
> data required. Such design leads to quite complex readers, and often ends
> up with producing closures in runtime. If you take a close look at binary’s
> microbenches (binary in general bit faster than cereal, so I focused on
> binary) you will discover, that it triggers GC even during simple tasks,
> such as reading a bunch of Word64 and summing results. And it relatively
> slow. Binary’s performance limited to hundreds of megabytes per second even
> in simple scenarios, but actual memory bandwidth should allow us process
> gigabytes per second. So, I decided to try a different approach.
> My approach is based on two assumptions so if it does not suit your case
> it obliviously will not work for you.
> First – when dealing with binary data we always know exactly how many
> bytes we want to read. This happens because we either read a fixed size
> primitive type or length-prefixed variable size data (possibly not
> “prefixed”, but you always know expected message length from somewhere).
> Second – chunks of data are continuous. I.e. we working with strict
> bytestrings more often. Even when you processing chunked stream of data
> with unknown number of messages (such as Mesos RecordIO Format) you
> should always be able to allocate continuous memory buffer for a single
> entity you want to deserialize.
> If it’s ok for you lets go on. First thing I’ve created was so called
> static reader. Static – because reader knows how mush bytes it wants to
> consume at compile time. Such readers suitable for reading combinations of
> primitive values. Here the implementation
> https://github.com/PROTEINE-INSAIDERS/lev-tolstoy/blob/master/src/Lev/Reader/Static.hs Static
> reader is somewhat monadic (you can use do-syntax with RebindableSyntax extension
> like here
> https://github.com/PROTEINE-INSAIDERS/lev-tolstoy/blob/master/bench/Bench/Lev/Reader/Static.hs )
> but it actually indexed monad and it unfortunately incompatible with
> traditional monads. Offsets and sizes are calculates at type level during
> the compile time and according to microbences static reader does not
> introduced any overhead compared to hand-written reader using the
> low-level indexInt64OffAddr# function (
> https://github.com/PROTEINE-INSAIDERS/lev-tolstoy/blob/master/bench/Bench/Handwritten.hs
>  )
> Next thing was a dynamic reader. Dynamic readers are composed from static
> readers, they are fully monadic and allow you to read length-prefixed data
> structures (you read length with wrapped static reader and then perform
> normal monadic binding to supply a reader which reads a value). Here the
> implementation
> https://github.com/PROTEINE-INSAIDERS/lev-tolstoy/blob/master/src/Lev/Reader/Dynamic.hs
> I’ve discovered, that typeclasses are surprisingly fast. So, it’s
> completely ok to have something like “(Cursor c) => … Reader” where cursor
> contains functions for overflow checking and memory buffer progression. I
> was not able to achieve same performance when I passed buffer callbacks explicitly,
> possibly because of inlining of specialization magic. So, I've created
> ByteStringCursor (
> https://github.com/PROTEINE-INSAIDERS/lev-tolstoy/blob/master/src/Lev/Reader/ByteString.hs )
> which allows to read strict bytestrings. I suppose that cursors introduce
> flexibility: you may read strict bytestring or you may define custom
> cursor for reading custom memory buffers.
> Finally, I’ve written some microbenches for my cases and also ported one
> microbench from binary. And my implementation was able to run 10x-15x
> faster than binary on strict bytestring. Moreover, it does not trigger GC
> during the run at all! You may find the project and all the benches here
> https://github.com/PROTEINE-INSAIDERS/lev-tolstoy
> So, what's the purpose of my message? It’s quite unlikely that I’ll
> release my code on the Hackage. This is because I’m a hobbyist Haskell
> developer and doing it all in my free time. But before mr. Putin will
> close Internet in Russia I’d like to demonstrate that it’s possible to write
> very fast binary deserializer taking in account several assumptions which
> seems quite reasonably for me. So, feel free to borrow the ideas, or share
> the ideas, if you know how it could be implemented better.
> --
> Sincerely, Stanislav Chernichkin.
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20190320/cb609004/attachment-0001.html>

More information about the Haskell-Cafe mailing list