Unsafe hGetContents

Tue Oct 20 08:58:40 EDT 2009

On 10/10/2009 18:59, Iavor Diatchki wrote:
> Hello,
>
> well, I think that the fact that we seem to have a program context
> that can distinguish "f1" from "f2" is worth discussing because I
> would have thought that in a pure language they are interchangable.
> The question is, does the context in Oleg's example really distinguish
> between "f1" and "f2"?  You seem to be saying that this is not the
> case:  in both cases you end up with the same non-deterministic
> program that reads two numbers from the standard input and subtracts
> them but you can't assume anything about the order in which the
> numbers are extracted from the input---it is merely an artifact of the
> GHC implementation that with "f1" the subtraction always happens the
> one way, and with "f2" it happens the other way.
>
> I can (sort of) buy this argument, after all, it is quite similar to
> what happens with asynchronous exceptions (f1 (error "1") (error "2")
> vs f2 (error "1") (error "2")).  Still, the whole thing does not
> "smell right":  there is some impurity going on here, and trying to
> offload the problem onto the IO monad only makes reasoning about IO
> computations even harder (and it is petty hard to start with).  So,
> discussion and alternative solutions should be strongly encouraged, I
> think.

Duncan has found a definition of hGetContents that explains why it has 
surprising behaviour, and that's very nice because it lets us write the 
compilers that we want to write, and we get to tell the users to stop 
moaning because the strange behaviour they're experiencing is allowed 
according to the spec.  :-)

Of course, the problem is that users don't want the hGetContents that 
has non-deterministic semantics, they want a deterministic one.  And for 
that, they want to fix the evaluation order (or something).  The obvious 
drawback with fixing the evaluation order is that it ties the hands of 
the compiler developers, and makes a fundamental change to the language 
definition.

Things will get a lot worse in the future as we experiment with more 
elaborate compiler optimisations and evaluation strategies.  I predict 
that eventually we'll have to ditch hGetContents, at least in its 
current generality.

Cheers,
	Simon

> -Iavor
>
>
>
>
>
>
>
> On Sat, Oct 10, 2009 at 7:38 AM, Duncan Coutts
> <duncan.coutts at googlemail.com>  wrote:
>> On Sat, 2009-10-10 at 02:51 -0700, oleg at okmij.org wrote:
>>
>>>> The reason it's hard is that to demonstrate a difference you have to get
>>>> the lazy I/O to commute with some other I/O, and GHC will never do that.
>>>
>>> The keyword here is GHC. I may well believe that GHC is able to divine
>>> programmer's true intent and so it always does the right thing. But
>>> writing in the language standard ``do what the version x.y.z of GHC
>>> does'' does not seem very appropriate, or helpful to other
>>> implementors.
>>
>> With access to unsafeInterleaveIO it's fairly straightforward to show
>> that it is non-deterministic. These programs that bypass the safety
>> mechanisms on hGetContents just get us back to having access to the
>> non-deterministic semantics of unsafeInterleaveIO.
>>
>>>> Haskell's IO library is carefully designed to not run into this
>>>> problem on its own.  It's normally not possible to get two Handles
>>>> with the same FD...
>>
>>> Is this behavior is specified somewhere, or is this just an artifact
>>> of a particular GHC implementation?
>>
>> It is in the Haskell 98 report, in the design of the IO library. It does
>> not not mention FDs of course. The IO/Handle functions it provides give
>> no (portable) way to obtain two read handles on the same OS file
>> descriptor. The hGetContents behaviour of semi-closing is to stop you
>> from getting two lazy lists of the same read Handle.
>>
>> There's nothing semantically wrong with you bypassing those restrictions
>> (eg openFile "/dev/fd/0") it just means you end up with a
>> non-deterministic IO program, which is something we typically try to
>> avoid.
>>
>> I am a bit perplexed by this whole discussion. It seems to come down to
>> saying that unsafeInterleaveIO is non-deterministic and that things
>> implemented on top are also non-deterministic. The standard IO library
>> puts up some barriers to restrict the non-determinism, but if you walk
>> around the barrier then you can still find it. It's not clear to me what
>> is supposed to be surprising or alarming here.
>>
>> Duncan
>>
>> _______________________________________________
>> Haskell-prime mailing list
>> Haskell-prime at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-prime
>>