[Haskell-cafe] ByteString and ByteString.Builder questions
Viktor Dukhovni
ietf-dane at dukhovni.org
Wed Nov 29 17:24:17 UTC 2023
On Wed, Nov 29, 2023 at 11:49:06AM +0000, Zoran BoĆĄnjak wrote:
> if I understand correctly, the ByteString.Builder is used to
> efficiently construct sequence of bytes from smaller parts.
Best used in continuation-passing-style (right-associatively), where all
the subsequent builders are lazily added as part of constructing the
"head" builder.
builder = chunk1 <> (chunk2 <> (chunk3 <> (... <> chunkN)...))
Repeatedly appending tail chunks (effectively left-associate) is
noticeably less efficient (similar to lists). A work-around is to
instead append (Builder->Builder) endomorphisms.
b1 = Endo (mappend chunk1)
b2 = b1 <> Endo (mappend chunk2)
b3 = b2 <> Endo (mappend chunk3)
...
bN = ...
and then extract the final builder via: `appEndo bN mempty`.
Endomorphism append will be more efficient once there are many parts to
combine.
> However, for inspecting data (take, head, index...), a plain
> ByteString is required.
For efficient processing of network streams, you'd perhaps use a
streaming API that exposes the input as a monadic stream of chunks,
and perhaps a corresponding parser layered on top that supports
consuming chunks monadically. The `streaming` ecosystem for
example has support for this model.
> What if the byte sequence manipulation task requires both, for example:
> - receive ByteString from the network (e.g: Network.Socket.ByteString.recv :: ... -> IO ByteString)
> - inspect and manipulate data (pure function)
> - resend to the network (e.g: Network.Socket.ByteString.sendMany :: ... -> [ByteString] -> IO ())
The input packet will be a `ByteString`, the output packet should be a
builder, that is converted at the last moment to a (possibly lazy)
bytestring for transmission. You shouldn't need to read your
output, so a single representation is sufficient.
> It is somewhat inconvenient to use 2 different types for the task,
> namely the ByteString and the Builder... where both represent a
> sequence of bytes.
A builder is not a sequence of bytes as such, it is a CPS-style
generator for a slice of a future sequence of bytes that can
incrementally build the entire sequence without reallocation
or copying (at least when the output is a lazy bytestring).
> I have tryed to define a Bytes type where both representations are available:
>
> import qualified Data.ByteString as BS
> import qualified Data.ByteString.Lazy as Bsl
> import qualified Data.ByteString.Builder as Bld
>
> data Bytes = Bytes
> { toByteString :: ByteString
> , toBuilder :: Builder
> , length :: Int
> }
This is not a productive direction to explore. Instead your *output*
should be a Builder, either constructed lazily in one go (with the tail
parts already lazily appended), or constructed by concatenation of
(Builder->Builder) endomorphisms. The inputs that individual builder
chunks will consume can be bytestring slices mixed with various other
data (e.g. builders for binary length fields that convert ints to
big-endian wire-form, ...).
--
Viktor.
More information about the Haskell-Cafe
mailing list