[Haskell-cafe] The question of ByteString

Fri Nov 2 18:09:33 EDT 2007

Andrew Coppin wrote:

> 1. Why do I have to type "ByteString" in my code? Why isn't the compiler 
> automatically performing this optimisation for me?

One reason is that ByteString is stricter than String.  Even lazy 
ByteString operates on 64KB chunks.  You can see how this might lead to 
problems with a String like this:

"foo" ++ undefined

The first three elements of this list are well-defined, but if you touch 
the fourth, you die.

> 2. ByteString makes text strings faster. But what about other kinds of 
> collections? Can't we do something similar to them that makes them go 
> faster?

Not as easily.  The big wins with ByteString are, as you observe, that 
the data are tiny, uniformly sized, and easily unboxed (though using 
ForeignPtr seems to be a significant win compared to UArray, too).  This 
also applies to other basic types like Int and Double, but leave those 
behind, and you get problems.

If your type is an instance of Storable, it's going to have a uniform 
size, but it might be expensive to flatten and unflatten it, so who 
knows whether or not it's truly beneficial.  If it's not an instance of 
Storable, you have to store an array of boxed values, and we know that 
arrays of boxes have crummy locality of reference.

Spencer Janssen hacked up the ByteString code to produce StorableVector 
as part of last year's SoC, but it never got finished off:

http://darcs.haskell.org/SoC/fps-soc/Data/StorableVector/

More recently, we've been pinning our hopes on the new list fusion stuff 
to give many of the locality of reference benefits of StorableVector 
with fewer restrictions, and all the heavy work done in a library.

	<b