[Haskell] ANNOUNCE: vector-bytestring-0.0.0.0
Bas van Dijk
v.dijk.bas at gmail.com
Wed Oct 12 16:02:51 CEST 2011
All your ByteString are belong to us...
I'm pleased to announce the beta release of vector-bytestring. This
library provides the type ByteString which is defined as a type
synonym for a storable Vector of Word8s (from the vector package):
type ByteString = Data.Vector.Storable.Vector Word8
It exports the same API as the bytestring package except that the
module names are prefixed with: Data.Vector.Storable.ByteString
instead of Data.ByteString.
The very ambitious goal of this package is that it will eventually
replace our beloved bytestring package. By basing this package on
vector, we can benefit from all the optimizations (like
stream-fusion!) in that library. We will also have just a single
library to test, debug and optimize.
I ported the bytestring test-suite to vector-bytestring. You can run it using:
$ cabal configure --enable-tests; cabal build; cabal test
All 54800 tests pass! Only one property doesn't hold:
prop_show :: ByteString -> Bool
prop_show x = show x == show (unpack x)
This is because I don't provide a custom Show instance for ByteStrings
but use the one from Vector which shows a vector like "fromList
[1,2,3]" instead as "\SOH\STX\ETX" like bytestring does. Hopefully
this is not a problem in practice.
I added a criterion based benchmark-suite to vector-bytestring. It
consists of over 600 benchmarks that cover almost every function in
the library. Also included are benchmarks which benchmark the fusion
capabilities of the library. Run it using:
$ cabal configure -fbenchmark; cabal build;
$ dist/build/bench/bench --help
Unfortunately, bytestring still out performs us in lots of benchmarks.
I believe the primary cause of this is that most functions are
implemented using stream-fusion. This is highly efficient if you use a
composition of these functions because they will all fuse into one
single efficient loop. However if your program uses only a single
function, the stream based implementation is often less efficient than
an implementation that works directly on a mutable vector (like most
functions in bytestring). So what we want is to use stream-fusion
where possible but use mutable vectors when our program doesn't fuse.
Fortunately, Roman Leshchinskiy (author of vector) has an idea how to
do this: http://trac.haskell.org/vector/ticket/60
Because we don't beat bytestring in all cases yet you should consider
this a beta-release and not use it in production code.
$ cabal install vector-bytestring
$ git clone https://github.com/basvandijk/vector-bytestring
More information about the Haskell