ByteString I/O Performance

Wed Sep 5 21:06:30 EDT 2007

On 06 Sep 2007 02:30:28 +0200, Peter Simons <simons at cryp.to> wrote:
> Duncan Coutts writes:
>
>  > What you want is just fine, but it's a mutable interface not a
>  > pure one. We cannot provide any operations that mutate an
>  > existing ByteString without breaking the semantics of all the
>  > pure operations.
>
> Is that so? How exactly does mutating a ByteString break the
> semantics of the pure function 'take'?
>

Because if you mutate the original bytestring the value of the other
bytestring (returned from 'take') will change. Not pure. Bad. Evil.
Etc.

>  > It's very much like the difference between the MArray and
>  > IArray classes, for mutable and immutable arrays. One provides
>  > index in a monad, the other is pure.
>
> Right. Now I wonder: why does ByteString provide an immutable
> interface but not a mutable one? Apparently mutable interfaces
> are useful for some purposes, right? Why else would the Array
> package provide one?

It doesn't provide two different interfaces to the same data
structure, it provides two different data structures. You can't have a
pure interface AND an impure one, as the impure one could then mutate
values that are used with the pure interface, which would mean that
the pure interface is broken (see above).

>  > Bear in mind, that these cache benefits are fairly small in
>  > real benchmarks as opposed to 'cat' on fully cached files.
>
> Do I understand that right? It sounds as if you were saying that
> -- in the general case -- allocating a new buffer for every
> single read() is not significantly slower than re-using the same
> buffer every time. Is that what you meant to say?

I think he said that most of the speed difference is due to better
cache performance when reusing the same buffer, but in general you do
"other stuff" as well which won't be as benign for the cache and the
difference will be smaller (if at all noticable).

>  > ByteString certainly isn't the right abstraction for that
>  > though.
>
> I am sorry, but that is nonsense. A ByteString is a tuple
> consisting of a pointer into raw memory, an integer signifying
> the size of the front gap, and an integer signifying the length
> of the payload. That data structure is near perfect for
> performing efficient I/O. To say that this abstraction isn't
> right for the task is absurd. What you mean to say is that you
> don't _intend_ it to be used that way, which is an altogether
> different thing.

A ByteString is an immutable data structure representing a string, if
you need a mutable one then it's not the right abstraction *by
definition*. Yes, a ByteString is not intended to be a mutable buffer,
which is precisely what makes it not the right abstraction if you need
that (not an "altogether different thing", it is THE thing).  The fact
that the internal representation would look similar to a different
abstraction which did allow mutation doesn't mean that *this*
abstraction is the right choice.
This is analogous to Java, and C# - if you need a mutable string
buffer the "string" class is not the right abstraction, you use the
string builder classes.

-- 
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862