[Haskell-beginners] Moment of enlightenment for lazy evaluation

Sat Feb 20 20:57:11 EST 2010

Am Sonntag 21 Februar 2010 02:23:48 schrieb Tom Tobin:
> On Sat, Feb 20, 2010 at 4:42 PM, Stephen Blackheath [to
> Haskell-Beginners] <mutilating.cauliflowers.stephen at blacksapphire.com>
>
> wrote:
> > Tom,
> >
> > The bad news is that 1. Haskell makes no guarantee about when the
> > files are closed,
>
> Hmm, Data.ByteString.Lazy.readFile's docstring says:
>
> "Read an entire file lazily into a ByteString. The Handle will be held
> open until EOF is encountered."

There is no hard guarantee when the file will be closed, but looking at the 
relevant code,

hGetContentsN :: Int -> Handle -> IO ByteString
hGetContentsN k h = lazyRead -- TODO close on exceptions
  where
    lazyRead = unsafeInterleaveIO loop

    loop = do
        c <- S.hGetNonBlocking h k
        --TODO: I think this should distinguish EOF from no data available
        -- the underlying POSIX call makes this distincion, returning 
either
        -- 0 or EAGAIN
        if S.null c
          then do eof <- hIsEOF h
                  if eof then hClose h >> return Empty
                         else hWaitForInput h (-1)
                           >> loop
          else do cs <- lazyRead
                  return (Chunk c cs)

I'd say the file is closed as soon as EOF is encountered. If you don't open 
too many files before you've finished reading, it shouldn't be a problem.

>
> It certainly seemed to change matters once I switched that $ to $!,
> though; I don't see why that would have helped me unless the handles
> were indeed being closed.
>

Right. The $! forced the file to be read until the end, so it was closed 
before too many others were opened.

> > 2. file handles are a limited resource
>
> Well, yes, that's why I ran into the original problem.
>
> > and 3. lazy I/O
> > doesn't handle errors in a recoverable fashion.
>
> I suppose this will be something I'll run into before too long.
>
> > Unfortunately this
> > means that lazy I/O is fundamentally unsound.
> >
> > The only safe way to do it is to read the file strictly in blocks
> > using Data.ByteString.hGet.
>
> But with the strict version of ByteString, how would I compute the
> SHA1 hash of an 8 GB file on a machine with quite a bit less memory?
> I can't imagine Haskell just has no way to handle a case that other
> languages handle easily.

Incrementally, like the SHA1 hash is computed with a lazy ByteString.
Read a chunk of the file (multiple of 512 bits is a good idea), process it, 
read next chunk, ..., until the end, then close the file.

The difference is that you have exact control what happens when this way, 
the unsafeInterleaveIO in the lazy ByteString code takes that control away.
However, by forcing the results at the proper places, you gain enough 
control to avoid the leaking of file handles and several other unpleasant 
surprises - normally, at least, there may be cases where you can't.