[Haskell-cafe] Fun with ByteStrings [was: A very edgy language]

Tillmann Rendel rendel at rbg.informatik.tu-darmstadt.de
Sun Jul 8 07:41:58 EDT 2007


Andrew Coppin wrote:

> Now, as I understand it, a ByteString is a kind of unboxed array (= big 
> RAM savings + big CPU time savings for not building it + big GC savings 
> for not processing millions of list nodes + better cache performance). 
> Or at least, a *strict* ByteString is; I'm very very fuzzy on exactly 
> how a *lazy* ByteString is any different to a normal list. From my 
> reading today, I take it the only real difference is that one is a 
> linked list, whereas the other is a (boxed?) array, so it's smaller. (?)

As I understand it (wich may or may not be correct):

A normal Haskell string is basically  [Word8]
A strict ByteString ist basically     UArray Int Word8
A lazy ByteString is basically        [UArray Int Word8]

[Word8] is processed one byte at once
UArray Int Word8 is processed all at once
[UArray Int Word8] is processed a chunk of bytes at once

So loading and processing [Word8] corresponds to

   while (!eof(file)) {
     Byte current = readByteFromFile(file);
     processByte(current);
   }

loading and processing a strict ByteString corresponds to

   int size = readWholeFileIntoBuffer(file, buffer);
   for (int i = 0; i < size; i++)
     processByte(buffer[i]);


and loading and processing a lazy ByteString corresponds to

   while (!eof(file)) {
     int size = readNextNBytesIntoBuffer(file, buffer, buffersize);
     for (int i = 0; i < size; i++)
       processByte(buffer[i]);
   }

in a C-like language. The first is nonsense because of I/O overhead, the 
second is fine for small files only and the third combines the best of 
both worlds. Unlike the C examples, Haskell lists face not only I/O 
overhead, but general overhead of boxed representation and laziness, 
too. But even in C, the third solution ist the best, but some programs 
don't use it. So Bytestring powered Haskell programs are able to 
outperform some C programs.

The most important contributions of the ByteStrings library seem to be 
these:

   (1) unify lazy and strict Bytestrings interface-wise and
   (2) allow idiomatic Haskell programs using lazy ByteStrings
       to be compiled to something like the third c program above.

   Tillmann


More information about the Haskell-Cafe mailing list