[Haskell] Lazy IO breaks purity

Wed Mar 4 21:22:06 EST 2009

On Wed, 2009-03-04 at 18:12 -0800, oleg at okmij.org wrote:
> 
> We demonstrate how lazy IO breaks referential transparency.  A pure
> function of the type Int->Int->Int gives different integers depending
> on the order of evaluation of its arguments. Our Haskell98 code uses
> nothing but the standard input.  We conclude that extolling the purity
> of Haskell and advertising lazy IO is inconsistent.
> 
> Henning Thielemann wrote on Haskell-Cafe:
> > Say I have a chain of functions: read a file, ... write that to a file,
> > all of these functions are written in a lazy way, which is currently
> > considered good style
> 
> Lazy IO should not be considered good style. One of the common
> definitions of purity is that pure expressions should evaluate to the
> same results regardless of evaluation order, or that equals can be
> substituted for equals. If an expression of the type Int evaluates to
> 1, we should be able to replace every occurrence of the expression with
> 1 without changing the results and other observables. 
> 
> The expression (read s) where s is the result of the (hGetContents h1)
> action has the pure type, e.g., Int. Yet its evaluation does more than
> just producing an integer: it may also close a file associated with
> the handle h1. Closing the file has an observable effect: the file
> descriptor, which is a scarce resource, is freed. Some OS lock the
> open file, or prevent operations such as renaming and deletion on the
> open file. A file system whose file is open cannot be
> unmounted. Suppose I use an Haskell application such as an editor to
> process data from a removable drive. I mount the drive, tell the
> application the file name. The (still running) application finished
> with the file and displayed the result. And yet I can't unplug the
> removable drive, because it turns out that the final result was
> produced without needing to read all the data from the file, and the
> file remains unclosed.
> 
> Some people say: the programmer should have used seq. That is the
> admission of guilt! An expression (read s)::Int that evaluates to 1 is 
> equal to 1. So, one should be able substitute (read s) with 1. If the
> result of evaluating 1 turns out not needed for the final outcome,
> then not evaluating (read s) should not affect the outcome. And yet it
> does. One uses seq to force evaluation of an expression even if the
> result may be unused. Thus the expression of a pure type 
> has *side-effect*. 
> 
> The advice about using seq reminds me of advice given to C programmers
> that they should not free a malloc'ed pointer twice, dereference a
> zero pointer and write past the boundary of an array. If lazy IO is
> considered good style given the proper use of seq, then C should be
> considered safe given the proper use of pointers and arrays.
> 
> Side affects like closing a file are observable in the external
> world. A program may observe these effects, in an IO monad. One can
> argue there are no guarantees at all about what happens in the IO
> monad. Can side-effects of lazy IO be observable in _pure_ Haskell
> code, in functions with pure type? Yes, they can. The examples are
> depressingly easy to write, once one realizes that reading does have
> side effects, POSIX provides for file descriptor aliasing, and sharing
> becomes observable with side effects. Here is a simple Haskell98 code.
> 
> 
> > {- Haskell98! -}
> >
> > module Main where
> >
> > import System.IO
> > import System.Posix.IO (fdToHandle, stdInput)
> >
> > -- f1 and f2 are both pure functions, with the pure type.
> > -- Both compute the result of the subtraction e1 - e2.
> > -- The only difference between them is the sequence of
> > -- evaluating their arguments, e1 `seq` e2 vs. e2 `seq` e1
> > -- For really pure functions, that difference should not be observable
> >
> > f1, f2:: Int -> Int -> Int
> >
> > f1 e1 e2 = e1 `seq` e2 `seq` e1 - e2
> > f2 e1 e2 = e2 `seq` e1 `seq` e1 - e2
> >
> > read_int s = read . head . words $ s
> >
> > main = do
> >        let h1 = stdin
> >        h2 <- fdToHandle stdInput
> >        s1 <- hGetContents h1
> >        s2 <- hGetContents h2
> >        print $ f1 (read_int s1) (read_int s2)
> >        -- print $ f2 (read_int s1) (read_int s2)
> 
> 
> One can compile it with GHC (I used 6.8.2, with and without -O2) and
> run like this
>    ~> /tmp/a.out 
>    1
>    2
>    -1
> That is, we run the program and type 1 and 2 as the inputs. The
> program prints out the result, -1. If we comment out the line
> "print $ f1 (read_int s1) (read_int s2)" and uncomment out the last
> line the transcript looks like this
>    ~> /tmp/a.out 
>    1
>    2
>    1
> 
> Running the code with Hugs exhibits the same behavior. Thus a pure
> function (-) of the pure type gives different results depending on the
> order of evaluating its arguments.
> 
> 	Is 1 = -1?

Lazy IO *as implemented (originally) in GHC* (and copied into the
Haskell standard) breaks referential transparency.[1]  Clean's object IO
system, I believe, would not have this problem.  Nor would a
well-designed Haskell IO system.

jcc