[Haskell-cafe] Odd lack of laziness

Jefferson Heard jeff at renci.org
Thu Jun 21 15:13:29 EDT 2007


Alright, I've been hacking away at what I posted the other day, and I
have something that works for files that will fit entirely into memory.
And then I figured out why I've been restricted to files that fit
entirely into memory...  One of my functions is causing the entire thing
to be read in, when, in the way I analyze the program, only a very small
portion of the file should be read in.  Here are the functions I've used
to test this problem...

import Data.Bits
import qualified Data.ByteString.Lazy as BS
import Foreign.C.Types
...

{-# INLINE decodeLengthBits #-}
decodeLengthBits :: BS.ByteString -> CInt
decodeLengthBits doc = (shift (pieces !! 3) 24) .|.
                       (shift (pieces !! 2) 16) .|.
                       (shift (pieces !! 1) 8) .|.
                       (pieces !! 0)
   where pieces::[CInt] = map fromIntegral . BS.unpack . BS.take 4 $ doc

breakIntoDocuments :: RawDocument -> [RawDocument]
breakIntoDocuments f | BS.length f > 0 = if len > 0
                                         then (BS.take bytes f) :
                                              (breakIntoDocuments
                                                 (BS.drop bytes f))
                                         else (breakIntoDocuments
                                                 (BS.drop bytes f))
                     | otherwise      = []
                     where len = decodeLengthBits f
                           bytes = fromIntegral (len * 2 + len * 4 + 4)


and a main function of:

main = do
 f <- B.readFile "Documents.bin"
 print (take 1 (breakIntoDocuments f))


Shouldn't the program only read in enough of the lazy byte-string to
create the first return value of breakIntoDocuments?  The return value
of decodeLengthBits is only 277.  I watched it, and it's reading in my
whole 2gb file...

-- Jeff




More information about the Haskell-Cafe mailing list