[Haskell-cafe] Efficient way to break up a lazy bytestring

Fri Dec 29 20:21:09 EST 2006

On Fri, Dec 29, 2006 at 04:56:34PM -0800, Ranjan Bagchi wrote:
> I'm loading a big file (~2G), again, with 72-byte C structs.  I've  
> done pretty well [thanks to everyone] interpreting the fields, but  
> I'm finding the traversal of the file to be pretty inefficient.
> 
> breakUp s
> 	| L.null s = []
> 	| otherwise = h:(breakUp r) where
> 		(h,r) = L.splitAt 72 s
> 
> My understanding is that while this looks tail recursive, it isn't  
> really because of laziness.  I've tried throwing seq operators  
> around, but they don't seem to do much to help the efficiency.

That looks like the correct lazy way to do it; by itself that breakUp
should be virtually instantaneous and consume neglible stack because
of laziness.  What kind of list consumer are you using?  (ie don't use
foldl, use foldl' or foldr, foldl is very prone to high stack use.)

Also, your breakUp is an unfold:

\begin{code}
import Data.List(unfoldr)

breakUp = unfoldr takeChunk where
    takeChunk s | L.null s  = Nothing
                | otherwise = Just $ L.splitAt 72 s
\end{code}