[Haskell-beginners] Moment of enlightenment for lazy evaluation
daniel.is.fischer at web.de
Sat Feb 20 20:57:11 EST 2010
Am Sonntag 21 Februar 2010 02:23:48 schrieb Tom Tobin:
> On Sat, Feb 20, 2010 at 4:42 PM, Stephen Blackheath [to
> Haskell-Beginners] <mutilating.cauliflowers.stephen at blacksapphire.com>
> > Tom,
> > The bad news is that 1. Haskell makes no guarantee about when the
> > files are closed,
> Hmm, Data.ByteString.Lazy.readFile's docstring says:
> "Read an entire file lazily into a ByteString. The Handle will be held
> open until EOF is encountered."
There is no hard guarantee when the file will be closed, but looking at the
hGetContentsN :: Int -> Handle -> IO ByteString
hGetContentsN k h = lazyRead -- TODO close on exceptions
lazyRead = unsafeInterleaveIO loop
loop = do
c <- S.hGetNonBlocking h k
--TODO: I think this should distinguish EOF from no data available
-- the underlying POSIX call makes this distincion, returning
-- 0 or EAGAIN
if S.null c
then do eof <- hIsEOF h
if eof then hClose h >> return Empty
else hWaitForInput h (-1)
else do cs <- lazyRead
return (Chunk c cs)
I'd say the file is closed as soon as EOF is encountered. If you don't open
too many files before you've finished reading, it shouldn't be a problem.
> It certainly seemed to change matters once I switched that $ to $!,
> though; I don't see why that would have helped me unless the handles
> were indeed being closed.
Right. The $! forced the file to be read until the end, so it was closed
before too many others were opened.
> > 2. file handles are a limited resource
> Well, yes, that's why I ran into the original problem.
> > and 3. lazy I/O
> > doesn't handle errors in a recoverable fashion.
> I suppose this will be something I'll run into before too long.
> > Unfortunately this
> > means that lazy I/O is fundamentally unsound.
> > The only safe way to do it is to read the file strictly in blocks
> > using Data.ByteString.hGet.
> But with the strict version of ByteString, how would I compute the
> SHA1 hash of an 8 GB file on a machine with quite a bit less memory?
> I can't imagine Haskell just has no way to handle a case that other
> languages handle easily.
Incrementally, like the SHA1 hash is computed with a lazy ByteString.
Read a chunk of the file (multiple of 512 bits is a good idea), process it,
read next chunk, ..., until the end, then close the file.
The difference is that you have exact control what happens when this way,
the unsafeInterleaveIO in the lazy ByteString code takes that control away.
However, by forcing the results at the proper places, you gain enough
control to avoid the leaking of file handles and several other unpleasant
surprises - normally, at least, there may be cases where you can't.
More information about the Beginners