[Haskell-cafe] Odd lack of laziness

Donald Bruce Stewart dons at cse.unsw.edu.au
Thu Jun 21 20:48:15 EDT 2007


jeff:
> Alright, I've been hacking away at what I posted the other day, and I
> have something that works for files that will fit entirely into memory.
> And then I figured out why I've been restricted to files that fit
> entirely into memory...  One of my functions is causing the entire thing
> to be read in, when, in the way I analyze the program, only a very small
> portion of the file should be read in.  Here are the functions I've used
> to test this problem...
> 
> import Data.Bits
> import qualified Data.ByteString.Lazy as BS
> import Foreign.C.Types
> ...
> 
> {-# INLINE decodeLengthBits #-}
> decodeLengthBits :: BS.ByteString -> CInt
> decodeLengthBits doc = (shift (pieces !! 3) 24) .|.
>                        (shift (pieces !! 2) 16) .|.
>                        (shift (pieces !! 1) 8) .|.
>                        (pieces !! 0)
>    where pieces::[CInt] = map fromIntegral . BS.unpack . BS.take 4 $ doc
> 
> breakIntoDocuments :: RawDocument -> [RawDocument]
> breakIntoDocuments f | BS.length f > 0 = if len > 0

Argh!

    -- | /O(n\/c)/ 'length' returns the length of a ByteString as an 'Int64'
    length :: ByteString -> Int64
    length (LPS ss) = L.foldl' (\n ps -> n + fromIntegral (P.length ps)) 0 ss

that'll force the whole file (it is the sum of the length of each
chunk). Try comparing against the null bytestring,

    -- | /O(1)/ Test whether a ByteString is empty.
    null :: ByteString -> Bool
    null (LPS []) = True
    null (_)      = False

:-)

>                                          then (BS.take bytes f) :
>                                               (breakIntoDocuments
>                                                  (BS.drop bytes f))
>                                          else (breakIntoDocuments
>                                                  (BS.drop bytes f))
>                      | otherwise      = []
>                      where len = decodeLengthBits f
>                            bytes = fromIntegral (len * 2 + len * 4 + 4)
> 
> 
> and a main function of:
> 
> main = do
>  f <- B.readFile "Documents.bin"
>  print (take 1 (breakIntoDocuments f))
> 
> 
> Shouldn't the program only read in enough of the lazy byte-string to
> create the first return value of breakIntoDocuments?  The return value
> of decodeLengthBits is only 277.  I watched it, and it's reading in my
> whole 2gb file...
> 
> -- Jeff

Got to be more lazy :-)

-- Don


More information about the Haskell-Cafe mailing list