[Haskell-cafe] On hGetContents semi-closenesscloseness
Rafael Cunha de Almeida
rafael-lists at kontesti.me
Tue Feb 15 17:57:24 CET 2011
From
http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html:
Computation hGetContents hdl returns the list of characters
corresponding to the unread portion of the channel or file
managed by hdl, which is put into an intermediate state,
semi-closed. In this state, hdl is effectively closed, but
items are read from hdl on demand and accumulated in a special
list returned by hGetContents hdl.
What state is that? It seems to be something related to Haskell. I
couldn't find a definition for it in the unix documentation I have
laying around.
Continuing reading System IO documentation I read:
Any operation that fails because a handle is closed, also fails
if a handle is semi-closed. The only exception is hClose. A
semi-closed handle becomes closed:
* if hClose is applied to it;
* if an I/O error occurs when reading an item from the
handle;
* or once the entire contents of the handle has been read.
So it looks like hGetContents sets some flag in the handler saying it is in
that semi-closed state. So no other operations are valid, but I think the file
descriptor to the file is actually kept open. It's only when the contents are
entirely consumed that the descriptor gets closed. Is hGetContents responsible
for closing the descriptor? Or is it the garbage collector? Who closes the
descriptor when the contents are read? Looking at hGetContents function
definition, it uses lazyRead to read the contents, but it calls a
wantReadableHandle which might or might not close the handle after lazyRead.
By looking at the documentation it seems like the only way for us to actively
close the descriptor is either reading the whole thing or calling hClose. But
one has to be very carefully when to call the handler, because it doesn't
matter if it looks like it was consumed, it really has to be consumed.
The following code prints the contents of foo file to the screen:
openFile "foo" ReadMode >>= \handle -> (hGetContents handle >>= (\s -> putStr s >> hClose handle)) [1]
The following code does not:
openFile "foo" ReadMode >>= \handle -> (hGetContents handle >>= (\s -> hClose handle >> putStr s)) [2]
It is common knowledge that haskell is very lazy, so it only does things
when absolutely necessary, otherwise it prefers to write it off in the
TODO list. It does that even if writing to the TODO takes longer than
the computation would, that's how lazy it is. That's the origin of the
often used expression "he is quite a haskell".
The question most people doesn't have a good answer is: when does
Haskell thinks it is necessary to do something?
In [2] the lazyRead inside of hGetContents (or perhaps hGetContents all
together) only gets executed after hClose handle. Why is that? How do I
figure out the ordering of computation?
More information about the Haskell-Cafe
mailing list