lazy ByteStrings: toChunks

Tue Jan 23 16:02:31 EST 2007

On Wed, 2007-01-24 at 07:32 +1100, Donald Bruce Stewart wrote:
> > toChunks exposes the implementation, and so shouldn't be in the public
> > interface, should it?  There could be a function from lazy to ordinary
> > ByteStrings (B.concat . toChunks), though.

No, I don't think it exposes the implementation that much. In particular
we could change the internal representation from a list of chunks to a
tree of chunks or a element-strict list of chunks without breaking the
toChunks function.

In the worst case of representation change, toChunks could still return
a single massive chunk.

> That seems reasonable. All uses I've ever had for toChunks involve also
> concat'ing them.

This is indeed the most common use however libraries like zlib/bzlib
compression, charset conversion, encryption, (de)serialisation etc that
need to work on contiguous chunks of memory need to be able to get at
the chunks. The only other thing they can do is to import the internal
module and get at the LPS constructor which is more evil and will break
if we change the underlying representation (and I do intend to
experiment with making the lazy byte string rep use element-strict lists
to remove one indirection).

> The idea originally was to avoid unnecessary strictness.

The other reason that we decided to include toChunks and decided not to
include a function that converts to a strict byte string is that I
didn't want to hide the expense of the operation from the user. toChunks
is O(1) and should remain O(1) with any reasonable representation change
that I can think of. toStrict however is O(n) and has to force the whole
stream into memory and copy it. It's expensive. If the user writes
(B.concat . toChunks) then this expense is explicit since they already
know that B.concat incurs that expense.

So I vote for the status-quo.

Duncan