[Haskell-cafe] On hGetContents semi-closenesscloseness

Tue Feb 15 20:06:54 CET 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

(I'm probably glossing over important stuff and getting some details wrong,
as usual, but I hope it's good enough to give some idea of what's going on.)

On 2/15/11 11:57 , Rafael Cunha de Almeida wrote:
> What state is that? It seems to be something related to Haskell. I
> couldn't find a definition for it in the unix documentation I have
> laying around.

Yes, it's specific to Haskell's runtime; if you have a handle being read
lazily "in the background" (see unsafeInterleaveIO), trying to use it "in
the foreground" is problematic.  Specifically, which call(s) should get the
data?

> entirely consumed that the descriptor gets closed. Is hGetContents responsible
> for closing the descriptor? Or is it the garbage collector? Who closes the
> descriptor when the contents are read? Looking at hGetContents function

The garbage collector closes the handle, as I understand it.

> openFile "foo" ReadMode >>= \handle -> (hGetContents handle >>= (\s -> hClose handle >> putStr s)) [2]

This is a classic example of the dangers of hGetContents (and, more
generally, of unsafeInterleaveIO).  In general, you should use lazy I/O only
for "quick and dirty" stuff and avoid it for serious programming.  You can
get many of the benefits of lazy I/O without the nondeterminacy by using
iteratee-based I/O (http://hackage.haskell.org/package/iteratee).

The usual way to deal with this is to force the read in some way, usually by
forcing evaluation of the length of the data (let s' = length s in evaluate
$ s' `seq` s' -- or something like that).

> The question most people doesn't have a good answer is: when does
> Haskell thinks it is necessary to do something?

Haskell is actually what manufacturing folks call "just in time"; things are
evaluated when they are needed.  Usually this means that when you output
something, anything needed to compute that output will be done then.  The
exceptions are things like Control.Exception.evaluate (which you can treat
as doing output but without *actually* outputting anything), mentioned
above, plus you can indicate that some computation must be evaluated before
another by means of Prelude.seq.  You can also declare a type as being
strict by prefixing an exclamation mark (so the runtime will always evaluate
a computation before binding it), and with the BangPatterns extension you
can also declare a pattern match binding as strict the same way.

Be aware that in most cases, evaluating a computation takes it to "weak head
normal form", which means that (as one would expect from a lazy language)
only the minimum amount of evaluation is done.  If nothing else forces
evaluation, this means that the computation is evaluated to the point of its
top level constructor and no further.  You can think of it this way:  all
expressions in Haskell are represented by "thunks" (little chunks of code),
and evaluation replaces the outermost thunk in an expression with the result
of running it.  So if we have an expression

    @(@[@a, at b],@(Foo @(Bar @d)))

(where a @ precedes a sub-expression which is unevaluated/a thunk), WHNF
removes the outermost (leftmost, here) @ by evaluating the tuple constructor
while leaving the elements of the tuple unevaluated.  If you need to force
evaluation in other ways, take a look at Control.DeepSeq
(http://hackage.haskell.org/package/deepseq).

The upshot of the above is that you can determine the order of evaluation by
working backwards from output computations.  It may be a partial ordering,
because when there are multiple independent computations required by another
computation, the order in which they are evaluated is undefined.  In
practice, this is usually unimportant because in pure code there is by
definition no difference between evaluation order in those cases (this is
technically called "referential integrity"); but when unsafeInterleaveIO is
used (as with hGetContents), it allows pure code to behave indeterminately
(it violates referential integrity).  This is why it is "unsafe" (and why
hGetContents is thereby unsafe), and why mechanisms like
Control.Exception.evaluate and seq are provided.

- -- 
brandon s. allbery     [linux,solaris,freebsd,perl]    allbery.b at gmail.com
system administrator  [openafs,heimdal,too many hats]                kf8nh
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk1azs4ACgkQIn7hlCsL25XlBwCg0dxc4pElXfFGNRh7m1Vezva4
dgQAnjIxlJhwTn2JBto005KfRSpc2Svr
=sQo7
-----END PGP SIGNATURE-----