IO monad and lazy evaluation

Wed, 21 May 2003 09:50:59 +0100

Hal,

I agree with the surface points you make.  It's easy enough to fix the 
problem once you realize it's there.  (In my own "real" program, I moved 
the hClose, which meant I had to pass the handle out of the function which 
opened it.)

The underlying thrust of my post was that I thought pure functional 
languages, like Haskell, were supposed to help one avoid such traps by 
ensuring the messy dependencies on ordering didn't arise in the first 
place.  As things stand, I don't think I could begin articulate a reliable 
set of rules for avoiding such problems, short of something like "use only 
strict functions in monad-chains".  I'm hoping the Haskell community has 
some experience with this kind of issue to offer some more helpful advice, 
or even tools to detect unsafe combinations.  Maybe a discussion of safe 
programming patterns would be a useful interim step?

(e.g. Ketil Z. Malde's suggestion of renaming the function to 
hUnsafeGetContents maybe a small step in the right direction?)

...

Thinking some more... I'm reminded of some discussions I had a few years 
ago about the timing of calls to Java finalizers, and problems this could 
cause for network I/O programs because using finalizers to close network 
sockets would lead to unexpected resource problems.  The only reliable 
solution was to always close the sockets explicitly when done.  With Java, 
coming from C/C++, it was possible to get into a mindset that automatic 
memory management also meant automatic management of all resources, 
including all those that weren't directly visible to the programmer.  Maybe 
there's a similar trap for the unware in Haskell?

Anyway, my thoughts are leading me to the idea that the problem is a 
disconnect (lack of formal connection or interlock) between the actions of 
opening a file, reading its contents and closing it.  For example, one 
could imagine a structure:

     hSafeGetContents :: Handle -> (String -> a) -> a
     hSafeGetContents handle function =
         function $ hUnsafeGetContents handle

Now the result string can be as lazy as you like, but I think one can 
guarantee that the handle won't be closed until the function has used as 
much of the content as it may need.

#g
--

At 13:27 20/05/03 -0700, Hal Daume III wrote:
>Yes.  This is because hGetContents (and hence readFile, etc.) use lazy
>IO.  Just as in this case you might want hClose to force the file to be
>read, in a case like:
>
> > do h <- openFile "really_large_file" ReadMode
> >    c <- hGetContents h >>= return . head
> >    hClose h
> >    return c
>
>you probably don't want the close to read the whole file.  I'd argue that
>that problem is not with hClose, but with hGetContents.  Really, a strict
>version should be used in most situations.  Something like:
>
> > hGetContentsStrict h = do
> >    b <- hIsEOF h
> >    if b then return [] else do
> >      c <- hGetChar h
> >      r <- hGetContentsStrict h
> >      return (c:r)
>
>of course, you could be smarter with buffering, etc.  Another way would be
>to do something using seq/deepSeq.
>
>  - Hal
>
>--
>  Hal Daume III                                   | hdaume@isi.edu
>  "Arrest this man, he talks in maths."           | www.isi.edu/~hdaume
>
>On Tue, 20 May 2003, Graham Klyne wrote:
>
> > There seems to be a difficult-to-justify interaction between
> > lazy evaluation and monadic I/O:
> >
> > [[
> > -- file: SpikeIOMonadCloseHandle.hs
> > -- Does hClose force completion of lazy I/O?
> >
> > import IO
> >
> > showFile fnam =
> >      do  { fh <- openFile fnam ReadMode
> >          ; fc <- hGetContents fh
> >          ; hClose fh
> >          ; putStr fc
> >          }
> >
> > test = showFile "SpikeIOMonadCloseHandle.hs"
> > ]]
> >
> > If I load this into Hugs and run it, the output is a single blank line.
> >
> > If I reverse the order of hClose and putStr, the source code is displayed.
> >
> > I think I can understand why this is happening, but it seems to me that 
> there's
> > a violation of referential transparency here:  I can't see any reasonable
> > justification for the value of 'fc' to vary depending on whether it's 
> actually
> > used before or after some other I/O operation.
> >
> > I suppose I was expecting the call of hClose to force complete evaluation
> > of any value that depends on the state prior to hClose.  I've no idea if
> > there's a reasonable way to implement that.
> >
> > My concern is that this weakens the claim for monads that they provide
> > a seamless integration between pure functional and stateful code;  cf.:
> > [[
> > We believe that, on the contrary, there are very significant differences
> > between
> > writing programs in C and writing in Haskell with monadic state
> > transformers and
> > IO:
> > [...]
> > - Usually, most of the program is neither stateful nor directly 
> concerned with
> > IO.  The monadic approach allows the graceful coexistence of a small amount
> > of imperative code and the large purely functional part of the program
> > [...]
> > - The usual coroutining behaviour of lazy evaluation, in which the 
> consumer of
> > a data structure coroutines with its producer, extends to stateful 
> computation
> > as well.  As Hughes argues (Hughes 1989), the ability to separate what is
> > computed from how much of it is computed is a powerful aid to writing 
> modular
> > programs
> > ]]
> > -- http://research.microsoft.com/Users/simonpj/Papers/state-lasc.ps.gz
> >
> > #g
> >
> >
> > -------------------
> > Graham Klyne
> > <GK@NineByNine.org>
> > PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
> >
> > _______________________________________________
> > Haskell mailing list
> > Haskell@haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell
> >

-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E