[Haskell-cafe] Efficient way to break up a lazy bytestring

Sat Dec 30 01:36:31 EST 2006

On Dec 29, 2006, at 6:32 PM, Matthew Brecknell wrote:

>> breakUp s
>> 	| L.null s = []
>> 	| otherwise = h:(breakUp r) where
>> 		(h,r) = L.splitAt 72 s
>>
>> Running this on the 2G file blows up the stack pretty quickly, taking
>> the first 1 million records (there are 20M of them) with a big stack
>> parameter gives about 25% productivity, with GC taking the other 75%.
>>
>> My understanding is that while this looks tail recursive, it isn't
>> really because of laziness.  I've tried throwing seq operators
>> around, but they don't seem to do much to help the efficiency.
>
> This function by itself doesn't really have any particular behaviour
> with respect to stack and heap usage, since it is just a linear  
> mapping
> from one lazy sequence to another. To understand the stack blowout,
> you'll need to look at what you are doing with the result of this
> function.
>
> For example, a foldl over the result might be building a big thunk on
> the heap, which could blow the stack when evaluated*. If this is the
> case, you might avoid building the big thunk by using foldl', a  
> version
> of foldl which evaluates as it folds.

I guess the consumer's really important (Didn't even occur to me, I  
was concentrating on how I was generating the list).
I was trying to de-lazy the list, I did the following:

bs = [...]
recs' = (take 1000000) breakUp bs
recs = foldr seq recs' recs'
print $ length recs

Would the fold be blowing the stack?

Ranjan