[Haskell-cafe] Efficient way to break up a lazy bytestring
Ranjan Bagchi
ranjan.bagchi at frotz.com
Sat Dec 30 01:36:31 EST 2006
On Dec 29, 2006, at 6:32 PM, Matthew Brecknell wrote:
>> breakUp s
>> | L.null s = []
>> | otherwise = h:(breakUp r) where
>> (h,r) = L.splitAt 72 s
>>
>> Running this on the 2G file blows up the stack pretty quickly, taking
>> the first 1 million records (there are 20M of them) with a big stack
>> parameter gives about 25% productivity, with GC taking the other 75%.
>>
>> My understanding is that while this looks tail recursive, it isn't
>> really because of laziness. I've tried throwing seq operators
>> around, but they don't seem to do much to help the efficiency.
>
> This function by itself doesn't really have any particular behaviour
> with respect to stack and heap usage, since it is just a linear
> mapping
> from one lazy sequence to another. To understand the stack blowout,
> you'll need to look at what you are doing with the result of this
> function.
>
> For example, a foldl over the result might be building a big thunk on
> the heap, which could blow the stack when evaluated*. If this is the
> case, you might avoid building the big thunk by using foldl', a
> version
> of foldl which evaluates as it folds.
I guess the consumer's really important (Didn't even occur to me, I
was concentrating on how I was generating the list).
I was trying to de-lazy the list, I did the following:
bs = [...]
recs' = (take 1000000) breakUp bs
recs = foldr seq recs' recs'
print $ length recs
Would the fold be blowing the stack?
Ranjan
More information about the Haskell-Cafe
mailing list