[Haskell-cafe] Efficient way to break up a lazy bytestring

Cale Gibbard cgibbard at gmail.com
Sat Dec 30 02:52:29 EST 2006


On 30/12/06, Ranjan Bagchi <ranjan.bagchi at frotz.com> wrote:
> I guess the consumer's really important (Didn't even occur to me, I
> was concentrating on how I was generating the list).
> I was trying to de-lazy the list, I did the following:
>
> bs = [...]
> recs' = (take 1000000) breakUp bs
> recs = foldr seq recs' recs'
> print $ length recs
>
> Would the fold be blowing the stack?
>
> Ranjan

Is there a really good reason to de-lazy the list? That's usually a
bad idea if the list is long. If your application can do any kind of
stream processing at all, then it's a better idea to allow it to work
with the list lazily. If it can't, and you're basically producing some
kind of summary of the input data, you're probably better off using a
strict left fold and consuming the list in a strict fashion, but not
forcing it earlier than actually needed. By strictifying the list,
you're making it impossible to work with the first cons without
computing the last one. That means that the following consumption of
the list by length can't even start until it's finished building the
whole thing in memory (you're wasting both time and space). Holding a
large list like that in memory all at once has few benefits. If you're
currently doing lazy IO and are concerned about the file changing as
you're processing it, you're better off copying the whole file into
memory as one big block and still building the list lazily from that
copy in memory.

If you're coming from any sort of imperative background, you can
basically think of a list as a loop which has not yet happened. If you
can generate the indices for the loop on the fly, you're probably not
going to want to waste time putting them all in memory beforehand,
regardless of what that loop is doing (how the list is being
consumed).

There are a few cases where strictifying production of data is good,
but it's far more common to want strict consumption and lazy
production.

 - Cale


More information about the Haskell-Cafe mailing list