[Haskell] Definitions of purity and Lazy IO

Thu Mar 5 23:48:33 EST 2009

On Thu, 2009-03-05 at 20:11 -0800, oleg at okmij.org wrote:
> As Amr Sabry aptly observed more than a decade ago discussions of
> purity and referential transparency usually lead to confusion and
> disagreement. His JFP paper provided the needed rigor and argued that
> Haskell even _with regular file (stream) IO_ is pure. As was shown
> yesterday, with Lazy IO, Haskell is not pure.
> 
> Before we discuss definitions, let us note the motivations. Suppose I
> have a Haskell program. As every single (compiled) Haskell program it
> has the main function, of the type IO a for some a. It must use IO, at
> least to print the final result, the observation of the
> program. Suppose the program contains the expression "e1 + e2" adding
> two integer expressions. Can I replace this expression with "e2 + e1"
> and be sure the observations of my program are unaffected?
> 
> If one says that ``you should assume that every IO operation returns a
> random result''  then we have no ability to reason about any
> real Haskell.

I think that `real' is a poor term to use in this context.  In any case,
we don't really need to assert that *any* IO action is
non-deterministic; we just have to assert that:

a) The result returned by unsafeInterleaveIO is non-deterministic, and
b) The point in the execution thread when the side effects performed by
unsafeInterleaveIO are inserted is non-deterministic.

As for the first claim: we have *stronger* guarantees for
unsafeInterleaveIO than for getStdRandom.  As for the second claim: we
have *stronger* guarantees for unsafeInterleavIO than for forkIO.  I
don't think it removes all possibility of reasoning about Haskell
programs to say these things.  You might want to stop hyper-ventilating.

> We can't be sure of soundness of any compiler
> optimization.

Nope.  *You* claim, without proof, that re-writing

  x `seq` y `seq` x - y

to

  y `seq` x `seq` x - y

is an un-sound optimization, for Haskell.  *We* claim it is perfectly
sound.  Who is denying the soundness of Haskell optimization again?

> So, Haskell is far worse than C!? With C, Xavier Leroy
> has managed to create a provably correct optimizing compiler, and
> mechanically _prove_ the soundness of all optimizations.

And plenty of C programs have situations where the relative ordering of
side effects is undefined.  In fact, quite a few syntactically valid,
well-typed, conforming C programs are declared `undefined' (and hence
not *strictly* conforming) by the standard, because their meaning
depends on the order of evaluation beyond the degree to which it is
well-defined.  So C is hardly `far [better]' than Haskell in this
regard!

> A C program,
> just like a Haskell program, must use IO at least to print the
> final result. We can never hope to build a provably correct compiler for
> Haskell? We cannot prove observational equivalences of any real
> Haskell program? Isn't that sad?

> The formal question is thus: are the following two expressions f1 and f2,
> of a *pure type* Int->Int->Int 
> 
> f1, f2:: Int -> Int -> Int
> f1 = \e1 e2 -> case (e1,e2) of
> 		(1,_) -> e1 - e2
> 		(_,_) -> e1 - e2
> 		
> f2 = \e1 e2 -> case (e1,e2) of
> 		(_,1) -> e1 - e2
> 		(_,_) -> e1 - e2
> 		
> equal? Before one invokes an equational theory or says that both these
> expressions are just integer subtraction, let me clarify the
> question: are f1 and f2 at least weakly observationally equivalent?

How weak?

> That is, for any program context C[] such that C[f1] is well-typed, 
> the program C[f2] must too be well-typed, and if one can observe the
> result of C[f1] and of C[f2], the two observations must be identical.

Every time?  For every context?  What about the context

  do
     x <- getStdRandom random
     y <- getStdRandom random
     print $ [] x y

?  By the standard above, f1 is not even observationally equivalent to
*itself*.

> The
> observation of a program may (and often does) involve side-effects and
> IO (more on it below). The message posted yesterday exhibited a context C[] that
> distinguishes f1 from f2. Thus, in presence of Lazy IO, any equational
> theory that equates f1 and f2 cannot be sound. I don't think one can
> design such a context C[] using the regular, eager file IO.
> 
> Amr Sabry in a paper ``What is a Purely Functional Language?''
> J. Functional Programming, 8(1), 1-22, Jan. 1998. 
> http://www.cs.indiana.edu/~sabry/papers/purelyFunctional.ps
> 
> proposed the definition of purity, see p 2 and Definition 4.7. The
> definition essentially states that call-by-name, call-by-value and
> call-by-need evaluation functions must be weakly equivalent.

How weakly?  (False && undefined) gives different answers under
call-by-name and call-by-value, but I don't think calling Haskell
`unpure' because it exists is useful.

> These
> evaluation functions map programs to observables. The part of
> evaluation dealing with observing the answers may have side effects!
> Sec 5.1 rigorously shows that lambda-calculus with global state
> and destructive mutation may be called pure functional, if
> effects are regarded as values and the program is written in what we
> now call a monadic style (whose idea was proposed by Reynolds back in
> 1981 when studying Idealized Algol). The end of Sec 5.1 remarks that
> IO can be treated that way -- and in fact, Appendix D of Haskell
> report 1.2 indeed had described such a semantics of IO:
> 	http://haskell.org/definition/haskell-report-1.2.ps.gz
> (the appendix starts on p. 139). Thanks to Paul Hudak for pointing this
> out three years ago.

Ahah!  Haskell 1.2 had lazy IO (readChan), no?

> Thus Haskell even with IO may be called pure functional. With Lazy IO,

For Lazy IO, read unsafeInterleaveIO.

> it can no longer be called pure functional as different orders of
> evaluation of arguments of a function lead to different program
> observations.

Maybe.  Nothing says GHC *won't* re-write f1 into f2, or vice-versa; it
just happens to be the case that when you tried it it didn't happen.

jcc