[Haskell-cafe] ByteString and ByteString.Builder questions
Zoran Bošnjak
zoran.bosnjak at via.si
Wed Nov 29 11:49:06 UTC 2023
Hi all,
if I understand correctly, the ByteString.Builder is used to efficiently construct sequence of bytes from smaller parts. However, for inspecting data (take, head, index...), a plain ByteString is required.
What if the byte sequence manipulation task requires both, for example:
- receive ByteString from the network (e.g: Network.Socket.ByteString.recv :: ... -> IO ByteString)
- inspect and manipulate data (pure function)
- resend to the network (e.g: Network.Socket.ByteString.sendMany :: ... -> [ByteString] -> IO ())
... where a pure data manipulation could be something like:
- extract some segments out of the original sequence
- create and add some segments of bytes
- reorder
- concatinate
It is somewhat inconvenient to use 2 different types for the task, namely the ByteString and the Builder... where both represent a sequence of bytes.
I have tryed to define a Bytes type where both representations are available:
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as Bsl
import qualified Data.ByteString.Builder as Bld
data Bytes = Bytes
{ toByteString :: ByteString
, toBuilder :: Builder
, length :: Int
}
instance Semigroup Bytes ...
instance Monoid Bytes ...
-- create
fromByteString :: ByteString -> Bytes
fromByteString bs = Bytes
{ toByteString = bs
, toBuilder = Bld.byteString bs
, length = BS.length bs
}
-- inspect function example
head :: Bytes -> Word8
head = BS.head . toByteString
-- prepare to send data over the network
toChunks :: Bytes -> [ByteString]
toChunks = Bsl.toChunks . Bld.toLazyByteString . toBuilder
The fields of Bytes are non-strict, so the expectation was that lazy evaluation will suspend unnecessary calculations and to have efficient inspection (via ByteString part) and efficient concatination (via Builder part).
I have performed some benchmarks (not sure if they are exactly to the point), but the results of Bytes are not so good:
Inspect test
using ByteString: OK (0.50s)
5.51 ms ± 342 μs
using Bytes: OK (0.17s)
62.0 ms ± 3.0 ms
Construct test
using Builder: OK (0.21s)
5.29 ms ± 361 μs
using Bytes: OK (0.17s)
21.3 ms ± 1.6 ms
naive: OK (0.81s)
49.6 ms ± 4.1 ms
Here is the full code:
https://gist.github.com/zoranbosnjak/7887d843056f07bac6061d20970e1d6a
My questions are:
0) Where does the big timing difference comming from? Thunks?
1) Is there any simple way (INLINE pragmas... or some other tricks) to get the performance back with the current implementation?
2) Would DList [ByteString] or any other type be any better over the ByteString.Builder in this case?
3) What would be the most efficient (and reasonable) implementation to support this kind of data processing (inspecting and concatination) in some uniform way? In other words: Is it worth to have uniformity here?
4) Is the Network.Socket.ByteString.sendMany function the way to go in cases where the byte sequence is constructed from segment or is there any better (faster) way?
regards,
Zoran
More information about the Haskell-Cafe
mailing list