[Haskell-cafe] Fun with ByteStrings [was: A very edgy language]
Tillmann Rendel
rendel at rbg.informatik.tu-darmstadt.de
Sun Jul 8 07:41:58 EDT 2007
Andrew Coppin wrote:
> Now, as I understand it, a ByteString is a kind of unboxed array (= big
> RAM savings + big CPU time savings for not building it + big GC savings
> for not processing millions of list nodes + better cache performance).
> Or at least, a *strict* ByteString is; I'm very very fuzzy on exactly
> how a *lazy* ByteString is any different to a normal list. From my
> reading today, I take it the only real difference is that one is a
> linked list, whereas the other is a (boxed?) array, so it's smaller. (?)
As I understand it (wich may or may not be correct):
A normal Haskell string is basically [Word8]
A strict ByteString ist basically UArray Int Word8
A lazy ByteString is basically [UArray Int Word8]
[Word8] is processed one byte at once
UArray Int Word8 is processed all at once
[UArray Int Word8] is processed a chunk of bytes at once
So loading and processing [Word8] corresponds to
while (!eof(file)) {
Byte current = readByteFromFile(file);
processByte(current);
}
loading and processing a strict ByteString corresponds to
int size = readWholeFileIntoBuffer(file, buffer);
for (int i = 0; i < size; i++)
processByte(buffer[i]);
and loading and processing a lazy ByteString corresponds to
while (!eof(file)) {
int size = readNextNBytesIntoBuffer(file, buffer, buffersize);
for (int i = 0; i < size; i++)
processByte(buffer[i]);
}
in a C-like language. The first is nonsense because of I/O overhead, the
second is fine for small files only and the third combines the best of
both worlds. Unlike the C examples, Haskell lists face not only I/O
overhead, but general overhead of boxed representation and laziness,
too. But even in C, the third solution ist the best, but some programs
don't use it. So Bytestring powered Haskell programs are able to
outperform some C programs.
The most important contributions of the ByteStrings library seem to be
these:
(1) unify lazy and strict Bytestrings interface-wise and
(2) allow idiomatic Haskell programs using lazy ByteStrings
to be compiled to something like the third c program above.
Tillmann
More information about the Haskell-Cafe
mailing list