[Haskell] Lazy IO breaks purity

Wed Mar 4 22:04:32 EST 2009

On Wednesday 04 March 2009 9:12:20 pm oleg at okmij.org wrote:
> We demonstrate how lazy IO breaks referential transparency.  A pure
> function of the type Int->Int->Int gives different integers depending
> on the order of evaluation of its arguments. Our Haskell98 code uses
> nothing but the standard input.  We conclude that extolling the purity
> of Haskell and advertising lazy IO is inconsistent.
>
> Henning Thielemann wrote on Haskell-Cafe:
> > Say I have a chain of functions: read a file, ... write that to a file,
> > all of these functions are written in a lazy way, which is currently
> > considered good style
>
> Lazy IO should not be considered good style. One of the common
> definitions of purity is that pure expressions should evaluate to the
> same results regardless of evaluation order, or that equals can be
> substituted for equals. If an expression of the type Int evaluates to
> 1, we should be able to replace every occurrence of the expression with
> 1 without changing the results and other observables.
>
> The expression (read s) where s is the result of the (hGetContents h1)
> action has the pure type, e.g., Int. Yet its evaluation does more than
> just producing an integer: it may also close a file associated with
> the handle h1. Closing the file has an observable effect: the file
> descriptor, which is a scarce resource, is freed. Some OS lock the
> open file, or prevent operations such as renaming and deletion on the
> open file. A file system whose file is open cannot be
> unmounted. Suppose I use an Haskell application such as an editor to
> process data from a removable drive. I mount the drive, tell the
> application the file name. The (still running) application finished
> with the file and displayed the result. And yet I can't unplug the
> removable drive, because it turns out that the final result was
> produced without needing to read all the data from the file, and the
> file remains unclosed.
>
> Some people say: the programmer should have used seq. That is the
> admission of guilt! An expression (read s)::Int that evaluates to 1 is
> equal to 1. So, one should be able substitute (read s) with 1. If the
> result of evaluating 1 turns out not needed for the final outcome,
> then not evaluating (read s) should not affect the outcome. And yet it
> does. One uses seq to force evaluation of an expression even if the
> result may be unused. Thus the expression of a pure type
> has *side-effect*.
>
> The advice about using seq reminds me of advice given to C programmers
> that they should not free a malloc'ed pointer twice, dereference a
> zero pointer and write past the boundary of an array. If lazy IO is
> considered good style given the proper use of seq, then C should be
> considered safe given the proper use of pointers and arrays.
>
> Side affects like closing a file are observable in the external
> world. A program may observe these effects, in an IO monad. One can
> argue there are no guarantees at all about what happens in the IO
> monad. Can side-effects of lazy IO be observable in _pure_ Haskell
> code, in functions with pure type? Yes, they can. The examples are
> depressingly easy to write, once one realizes that reading does have
> side effects, POSIX provides for file descriptor aliasing, and sharing
> becomes observable with side effects. Here is a simple Haskell98 code.
>
> > {- Haskell98! -}
> >
> > module Main where
> >
> > import System.IO
> > import System.Posix.IO (fdToHandle, stdInput)
> >
> > -- f1 and f2 are both pure functions, with the pure type.
> > -- Both compute the result of the subtraction e1 - e2.
> > -- The only difference between them is the sequence of
> > -- evaluating their arguments, e1 `seq` e2 vs. e2 `seq` e1
> > -- For really pure functions, that difference should not be observable
> >
> > f1, f2:: Int -> Int -> Int
> >
> > f1 e1 e2 = e1 `seq` e2 `seq` e1 - e2
> > f2 e1 e2 = e2 `seq` e1 `seq` e1 - e2
> >
> > read_int s = read . head . words $ s
> >
> > main = do
> >        let h1 = stdin
> >        h2 <- fdToHandle stdInput
> >        s1 <- hGetContents h1
> >        s2 <- hGetContents h2
> >        print $ f1 (read_int s1) (read_int s2)
> >        -- print $ f2 (read_int s1) (read_int s2)
>
> One can compile it with GHC (I used 6.8.2, with and without -O2) and
> run like this
>    ~> /tmp/a.out
>    1
>    2
>    -1
> That is, we run the program and type 1 and 2 as the inputs. The
> program prints out the result, -1. If we comment out the line
> "print $ f1 (read_int s1) (read_int s2)" and uncomment out the last
> line the transcript looks like this
>    ~> /tmp/a.out
>    1
>    2
>    1
>
> Running the code with Hugs exhibits the same behavior. Thus a pure
> function (-) of the pure type gives different results depending on the
> order of evaluating its arguments.
>
> 	Is 1 = -1?

This does not conclusively demonstrate a violation of referential 
transparency. One can still consign the impurity to the IO monad. One just 
needs to specify that when hGetContents is called twice (or more times) on the 
standard input in the way you've done, what happens is that the input gets 
arbitrarily split between the two character streams in some way. It just so 
happens that when the program is compiled with f1, the split:

  s1 = "1\n..."
  s2 = "2\n..."

is chosen, while when when compiled with f2,

  s1 = "2\n..."
  s2 = "1\n..."

always happens to be chosen. So the difference can be modeled by IO making 
arbitrary, nondeterministic decisions that incidentally coincide with the 
evaluation order of things in the program. This is perhaps not very 
satisfying, and makes the semantics of IO very strange (IO actions can 
essentially look into the future), but it means referential transparency isn't 
broken.

Note, for instance, that if you change the program to print the results of 
both f1 and f2, they will always agree, even though the order in which you 
print them will change which value they agree upon.

-- Dan