I/O buffering (was: Endangered I/O operations)

Simon Marlow simonmar@microsoft.com
Thu, 24 May 2001 10:02:43 +0100


[ moved from glasgow-haskell-users to haskell@haskell.org ]

Carl Witty writes:
> If the report does not allow the implementation to flush buffers at
> any time, I would call that a bug in the report.

Indeed, perhaps the report should be clarified on this issue.
Currently, in section 11.4.2 the report specifies the conditions under
which a buffer is flushed, and conditions under which input is available
from a buffered read handle.

GHC deviates from the report in a few ways:

  - we don't adhere strictly to block buffering for hPutStr: we
    may occasionally flush the buffer early.

  - a line-buffered input handle doesn't wait for the newline
    to be available before releasing the input, unless you use
    hGetLine of course.  However, since the report doesn't specify
    the *size* of a line buffer, we can't really be accused of
    deviating here.

  - a read/write handle (files only) has a single buffer which
    contains either pending read or write items, never both.  A
    read from a read/write handle will cause any pending writes
    to be flushed, and vice versa.

IMHO, the report should state that additional buffer flushes may be
performed at the discretion of the implementation, but an application
should not rely on any additional flushing happening.

The report doesn't state whether, when discarding a read buffer, the
data should be "returned to the file" (ie. the underlying file pointer
moved backwards) if possible.  GHC does this if possible.

One other point that I noticed while re-implementing I/O:  the
description for hLookAhead states that "Computation hLookAhead hdl
returns the next character from handle hdl without removing it from the
input buffer".  What if the handle is unbuffered?  A working hLookAhead
is more or less required in order to implement hIsEOF (at least on a
stream: for a file you can check EOF without attempting to read).  So
for GHC I had to ensure that even an unbuffered handle has a 1-character
buffer; this turned out to be not as painful as I thought.

Cheers,
	Simon

>  I would much rather
> use an implementation where stdout and stderr came out in the right
> order, and reading from stdin flushed stdout.  (As another example, an
> implementation might want to flush all buffers before doing a fork(),
> to avoid duplicated output.)
>=20
> The only caveat is that if such flushing is allowed but not required,
> it might encourage writing sloppy, nonportable code.
>=20
>=20
>=20