[Haskell-cafe] Re: [Haskell] Lazy IO breaks purity

Lennart Augustsson lennart at augustsson.net
Thu Mar 5 09:58:03 EST 2009

You're assuming that IO operations have semantics.

>From the Haskell program's point of view, and when reasoning about
Haskell programs (not their interaction with the world) you should
assume that every IO operation returns a random result.
The way Oleg's program behaves does not break RT under the random
result assumption.

What could break RT is if you could do this
   f :: Int -> Int <- someIOoperation
   let x = f 0; y = f 0
and end up with x and y not being equal.
It's (of course) easy to construct someIOoperation that has this
behaviour (using FFI), but I don't think you can construct it just
using the normal IO operations and unsafeInterleaveIO.

But as I said, I think this is a problem anyway, because IO without
semantics is rather useless.

  -- Lennart

On Thu, Mar 5, 2009 at 1:08 PM, Simon Marlow <marlowsd at gmail.com> wrote:
> Lennart Augustsson wrote:
>> I don't see any breaking of referential transparence in your code.
>> Every time you do an IO operation the result is basically
>> non-deterministic since you are talking to the outside world.
>> You're assuming the IO has some kind of semantics that Haskell makes
>> no promises about.
>> I'm not saying that this isn't a problem, because it is.
>> But it doesn't break referential transparency, it just makes the
>> semantics of IO even more complicated.
>> (I don't have a formal proof that unsafeInterleaveIO cannot break RT,
>> but I've not seen an example where it does yet.)
> So the argument is something like: we can think of the result of a call to
> unsafeInterleaveIO as having been chosen at the time we called
> unsafeInterleaveIO, rather than when its result is actually evaluated. This
> is on dodgy ground, IMO: either you admit that the IO monad contains an
> Oracle, or you admit it can time-travel.   I don't believe in either of
> those things :-)
> Cheers,
>        Simon
>>  -- Lennart
>> On Thu, Mar 5, 2009 at 2:12 AM,  <oleg at okmij.org> wrote:
>>> We demonstrate how lazy IO breaks referential transparency.  A pure
>>> function of the type Int->Int->Int gives different integers depending
>>> on the order of evaluation of its arguments. Our Haskell98 code uses
>>> nothing but the standard input.  We conclude that extolling the purity
>>> of Haskell and advertising lazy IO is inconsistent.
>>> Henning Thielemann wrote on Haskell-Cafe:
>>>> Say I have a chain of functions: read a file, ... write that to a file,
>>>> all of these functions are written in a lazy way, which is currently
>>>> considered good style
>>> Lazy IO should not be considered good style. One of the common
>>> definitions of purity is that pure expressions should evaluate to the
>>> same results regardless of evaluation order, or that equals can be
>>> substituted for equals. If an expression of the type Int evaluates to
>>> 1, we should be able to replace every occurrence of the expression with
>>> 1 without changing the results and other observables.
>>> The expression (read s) where s is the result of the (hGetContents h1)
>>> action has the pure type, e.g., Int. Yet its evaluation does more than
>>> just producing an integer: it may also close a file associated with
>>> the handle h1. Closing the file has an observable effect: the file
>>> descriptor, which is a scarce resource, is freed. Some OS lock the
>>> open file, or prevent operations such as renaming and deletion on the
>>> open file. A file system whose file is open cannot be
>>> unmounted. Suppose I use an Haskell application such as an editor to
>>> process data from a removable drive. I mount the drive, tell the
>>> application the file name. The (still running) application finished
>>> with the file and displayed the result. And yet I can't unplug the
>>> removable drive, because it turns out that the final result was
>>> produced without needing to read all the data from the file, and the
>>> file remains unclosed.
>>> Some people say: the programmer should have used seq. That is the
>>> admission of guilt! An expression (read s)::Int that evaluates to 1 is
>>> equal to 1. So, one should be able substitute (read s) with 1. If the
>>> result of evaluating 1 turns out not needed for the final outcome,
>>> then not evaluating (read s) should not affect the outcome. And yet it
>>> does. One uses seq to force evaluation of an expression even if the
>>> result may be unused. Thus the expression of a pure type
>>> has *side-effect*.
>>> The advice about using seq reminds me of advice given to C programmers
>>> that they should not free a malloc'ed pointer twice, dereference a
>>> zero pointer and write past the boundary of an array. If lazy IO is
>>> considered good style given the proper use of seq, then C should be
>>> considered safe given the proper use of pointers and arrays.
>>> Side affects like closing a file are observable in the external
>>> world. A program may observe these effects, in an IO monad. One can
>>> argue there are no guarantees at all about what happens in the IO
>>> monad. Can side-effects of lazy IO be observable in _pure_ Haskell
>>> code, in functions with pure type? Yes, they can. The examples are
>>> depressingly easy to write, once one realizes that reading does have
>>> side effects, POSIX provides for file descriptor aliasing, and sharing
>>> becomes observable with side effects. Here is a simple Haskell98 code.
>>>> {- Haskell98! -}
>>>> module Main where
>>>> import System.IO
>>>> import System.Posix.IO (fdToHandle, stdInput)
>>>> -- f1 and f2 are both pure functions, with the pure type.
>>>> -- Both compute the result of the subtraction e1 - e2.
>>>> -- The only difference between them is the sequence of
>>>> -- evaluating their arguments, e1 `seq` e2 vs. e2 `seq` e1
>>>> -- For really pure functions, that difference should not be observable
>>>> f1, f2:: Int -> Int -> Int
>>>> f1 e1 e2 = e1 `seq` e2 `seq` e1 - e2
>>>> f2 e1 e2 = e2 `seq` e1 `seq` e1 - e2
>>>> read_int s = read . head . words $ s
>>>> main = do
>>>>       let h1 = stdin
>>>>       h2 <- fdToHandle stdInput
>>>>       s1 <- hGetContents h1
>>>>       s2 <- hGetContents h2
>>>>       print $ f1 (read_int s1) (read_int s2)
>>>>       -- print $ f2 (read_int s1) (read_int s2)
>>> One can compile it with GHC (I used 6.8.2, with and without -O2) and
>>> run like this
>>>  ~> /tmp/a.out
>>>  1
>>>  2
>>>  -1
>>> That is, we run the program and type 1 and 2 as the inputs. The
>>> program prints out the result, -1. If we comment out the line
>>> "print $ f1 (read_int s1) (read_int s2)" and uncomment out the last
>>> line the transcript looks like this
>>>  ~> /tmp/a.out
>>>  1
>>>  2
>>>  1
>>> Running the code with Hugs exhibits the same behavior. Thus a pure
>>> function (-) of the pure type gives different results depending on the
>>> order of evaluating its arguments.
>>>       Is 1 = -1?
>>> _______________________________________________
>>> Haskell mailing list
>>> Haskell at haskell.org
>>> http://www.haskell.org/mailman/listinfo/haskell
>> _______________________________________________
>> Haskell mailing list
>> Haskell at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

More information about the Haskell-Cafe mailing list