[Haskell] Lazy IO breaks purity

Wed Mar 4 21:12:20 EST 2009

We demonstrate how lazy IO breaks referential transparency.  A pure
function of the type Int->Int->Int gives different integers depending
on the order of evaluation of its arguments. Our Haskell98 code uses
nothing but the standard input.  We conclude that extolling the purity
of Haskell and advertising lazy IO is inconsistent.

Henning Thielemann wrote on Haskell-Cafe:
> Say I have a chain of functions: read a file, ... write that to a file,
> all of these functions are written in a lazy way, which is currently
> considered good style

Lazy IO should not be considered good style. One of the common
definitions of purity is that pure expressions should evaluate to the
same results regardless of evaluation order, or that equals can be
substituted for equals. If an expression of the type Int evaluates to
1, we should be able to replace every occurrence of the expression with
1 without changing the results and other observables. 

The expression (read s) where s is the result of the (hGetContents h1)
action has the pure type, e.g., Int. Yet its evaluation does more than
just producing an integer: it may also close a file associated with
the handle h1. Closing the file has an observable effect: the file
descriptor, which is a scarce resource, is freed. Some OS lock the
open file, or prevent operations such as renaming and deletion on the
open file. A file system whose file is open cannot be
unmounted. Suppose I use an Haskell application such as an editor to
process data from a removable drive. I mount the drive, tell the
application the file name. The (still running) application finished
with the file and displayed the result. And yet I can't unplug the
removable drive, because it turns out that the final result was
produced without needing to read all the data from the file, and the
file remains unclosed.

Some people say: the programmer should have used seq. That is the
admission of guilt! An expression (read s)::Int that evaluates to 1 is 
equal to 1. So, one should be able substitute (read s) with 1. If the
result of evaluating 1 turns out not needed for the final outcome,
then not evaluating (read s) should not affect the outcome. And yet it
does. One uses seq to force evaluation of an expression even if the
result may be unused. Thus the expression of a pure type 
has *side-effect*. 

The advice about using seq reminds me of advice given to C programmers
that they should not free a malloc'ed pointer twice, dereference a
zero pointer and write past the boundary of an array. If lazy IO is
considered good style given the proper use of seq, then C should be
considered safe given the proper use of pointers and arrays.

Side affects like closing a file are observable in the external
world. A program may observe these effects, in an IO monad. One can
argue there are no guarantees at all about what happens in the IO
monad. Can side-effects of lazy IO be observable in _pure_ Haskell
code, in functions with pure type? Yes, they can. The examples are
depressingly easy to write, once one realizes that reading does have
side effects, POSIX provides for file descriptor aliasing, and sharing
becomes observable with side effects. Here is a simple Haskell98 code.

> {- Haskell98! -}
>
> module Main where
>
> import System.IO
> import System.Posix.IO (fdToHandle, stdInput)
>
> -- f1 and f2 are both pure functions, with the pure type.
> -- Both compute the result of the subtraction e1 - e2.
> -- The only difference between them is the sequence of
> -- evaluating their arguments, e1 `seq` e2 vs. e2 `seq` e1
> -- For really pure functions, that difference should not be observable
>
> f1, f2:: Int -> Int -> Int
>
> f1 e1 e2 = e1 `seq` e2 `seq` e1 - e2
> f2 e1 e2 = e2 `seq` e1 `seq` e1 - e2
>
> read_int s = read . head . words $ s
>
> main = do
>        let h1 = stdin
>        h2 <- fdToHandle stdInput
>        s1 <- hGetContents h1
>        s2 <- hGetContents h2
>        print $ f1 (read_int s1) (read_int s2)
>        -- print $ f2 (read_int s1) (read_int s2)

One can compile it with GHC (I used 6.8.2, with and without -O2) and
run like this
   ~> /tmp/a.out 
   1
   2
   -1
That is, we run the program and type 1 and 2 as the inputs. The
program prints out the result, -1. If we comment out the line
"print $ f1 (read_int s1) (read_int s2)" and uncomment out the last
line the transcript looks like this
   ~> /tmp/a.out 
   1
   2
   1

Running the code with Hugs exhibits the same behavior. Thus a pure
function (-) of the pure type gives different results depending on the
order of evaluating its arguments.

	Is 1 = -1?