[Haskell-cafe] What unsafeInterleaveIO is unsafe

Sun Mar 15 21:04:36 EDT 2009

Yusaku Hashimoto wrote:
> Hello,
> 
> I was studying about what unsafeInterleaveIO is.I understood
> unsafeInterleaveIO takes an IO action, and delays it. But I couldn't
> find any reason why unsafeInterleaveIO is unsafe.
> 
> I have already read an example in
> http://www.haskell.org/pipermail/haskell-cafe/2009-March/057101.html
> says lazy IO may break purity, but I think real matter in this example
> are wrong use of seq. did I misread?

For example: I have some universal state in IO. We'll call it an IORef, 
but it could be anything, like reading lines from a file. And I have 
some method for accessing and updating that state.

 > next r = do n <- readIORef r
 >             writeIORef r (n+1)
 >             return n

Now, if I use unsafeInterleaveIO:

 > main = do r <- newIORef 0
 >           x <-  do a <- unsafeInterleaveIO (next r)
 >                    b <- unsafeInterleaveIO (next r)
 >                    return (a,b)
 >           ...

The values of a and b in x are entirely arbitrary, and are only set at 
the point when they are first accessed. They're not just arbitrary 
between which is 0 and which is 1, they could be *any* pair of values 
(other than equal) since the reference r is still in scope and other 
code in the ... could affect it before we access a and b, or between the 
two accesses.

The arbitrariness is not "random" in the statistical sense, but rather 
is an oracle for determining the order in which evaluation has occurred. 
Consider, as an illustration these two alternatives for the ...:

 >           fst x `seq` snd x `seq` return x

vs

 >           snd x `seq` fst x `seq` return x

In this example, main will return (0,1) or (1,0) depending on which was 
chosen. You are right in that the issue lies in seq, but that's a red 
herring. Having made x, we can pass it along to any function, ignore the 
output of that function, and inspect x in order to know the order of 
strictness in that function.

Moreover, let's have two pure implementations, f and g, of the same 
mathematical function. Even if f and g are close enough to correctly 
give the same output for inputs with _|_ in them, we may be able to 
observe the fact that they arrive at those answers differently by 
passing in our x. Given that such observations are possible, it is no 
longer safe to exchange f and g for one another, despite the fact that 
they are pure and give the same output for all (meaningful) inputs.

This example is somewhat artificial because we set up x to use 
unsafeInterleaveIO in the bad way. For the intended use cases where it 
is indeed (arguably) safe, we would need to be sure to manually thread 
the state through the pure value (e.g. x) such that the final value is 
sane. For instance, in lazy I/O where we're constructing a list of 
lines/bytes/whatever, we need to ensure that any access to the Nth 
element of the list will first force the (N-1)th element, so that we 
ensure that the list comes out in the same order as if we forced all of 
them at construction time.

For things like arbitrary symbol generation, unsafeInterleaveIO is 
perfectly fine because the order and identity of the symbols generated 
is irrelevant, but more importantly it is safe because the "IO" that's 
going on is not actually I/O. For arbitrary symbol generation, we could 
use unsafeInterleaveST instead, and that would be better because it 
accurately describes the effects. For any IO value which has real I/O 
effects, unsafeInterleaveIO is almost never correct because the ordering 
of effects on the real world (or whether the effects occur at all) 
depends entirely on the evaluation behavior of the program, which can 
vary by compiler, by compiler version, or even between different runs of 
the same compiled binary.

-- 
Live well,
~wren