[Haskell-cafe] The question of ByteString
Bryan O'Sullivan
bos at serpentine.com
Fri Nov 2 18:09:33 EDT 2007
Andrew Coppin wrote:
> 1. Why do I have to type "ByteString" in my code? Why isn't the compiler
> automatically performing this optimisation for me?
One reason is that ByteString is stricter than String. Even lazy
ByteString operates on 64KB chunks. You can see how this might lead to
problems with a String like this:
"foo" ++ undefined
The first three elements of this list are well-defined, but if you touch
the fourth, you die.
> 2. ByteString makes text strings faster. But what about other kinds of
> collections? Can't we do something similar to them that makes them go
> faster?
Not as easily. The big wins with ByteString are, as you observe, that
the data are tiny, uniformly sized, and easily unboxed (though using
ForeignPtr seems to be a significant win compared to UArray, too). This
also applies to other basic types like Int and Double, but leave those
behind, and you get problems.
If your type is an instance of Storable, it's going to have a uniform
size, but it might be expensive to flatten and unflatten it, so who
knows whether or not it's truly beneficial. If it's not an instance of
Storable, you have to store an array of boxed values, and we know that
arrays of boxes have crummy locality of reference.
Spencer Janssen hacked up the ByteString code to produce StorableVector
as part of last year's SoC, but it never got finished off:
http://darcs.haskell.org/SoC/fps-soc/Data/StorableVector/
More recently, we've been pinning our hopes on the new list fusion stuff
to give many of the locality of reference benefits of StorableVector
with fewer restrictions, and all the heavy work done in a library.
<b
More information about the Haskell-Cafe
mailing list