lazy file reading in H98

Simon Marlow simonmar@microsoft.com
Tue, 3 Apr 2001 16:29:00 +0100


I admit the existing behaviour is unsatisfactory.  However, I'd like to
point out that a program using the sequence

	s <- readFile f
	...
	writeFile f s'

is arguably wrong, even given semantics (2) for readFile, becuase it's
non-atomic.  A more correct sequence is

	s <- readFile f
	...
	writeFile f' s'
	renameFile f' f

where f' is a temporary file name.

Cheers,
	Simon

> -----Original Message-----
> From: Simon Peyton-Jones=20
> Sent: Tuesday, April 03, 2001 12:01 PM
> To: Libraries for Haskell List
> Subject: FW: lazy file reading in H98
>=20
>=20
> Here's a library issue.
>=20
> The conclusion of this conversation was that H98 already specifies
> option (1) below, and I will clarify that in revising the library
> report.
> Nevertheless, the absence of a simple way to read-modify-write a file
> is a pain in the neck.=20
>=20
> Question: should one of our extended-IO libraries support a version of
> openFile that guarantees option (2)?
>=20
> Simon
>=20
> -----Original Message-----
> From: Manuel M. T. Chakravarty [mailto:chak@cse.unsw.edu.au]=20
> Sent: 05 September 2000 02:10
> To: haskell@haskell.org
> Subject: lazy file reading in H98
>=20
>=20
> In an assignment, in my class, we came across a lack of=20
> specification of
> the behaviour of `Prelude.readFile' and `IO.hGetContents' and=20
> IMHO also
> a lack of functionality.  As both operations read a file lazily,
> subsequent writes to the same file are potentially=20
> disastrous.  In this
> assignment, the file was used to make a Haskell data structure
> persistent over multiple runs of the program - ie,=20
>=20
>   readFile fname >>=3D return . read
>=20
> at the start of the program and
>=20
>   writeFile fname . show
>=20
> at the end of the program.  For certain inputs, where the
> data structure stored in the file was only partially used,
> the file was overwritten before it was fully read.
>=20
> H98 doesn't really specify what happens in this situation.
> I think, there are two ways to solve that:
>=20
> (1) At least, the definition should say that the behaviour
>     is undefined if a program every writes to a file that it
>     has read with `readFile' or `hGetContents' before.
>=20
> (2) Alternatively, it could demand more sophistication from
>     the implementation and require that upon opening of a
>     file for writing that is currently semi-closed, the
>     implementation has to make sure that the contents of the
>     semi-closed file is not corrupted before it is fully
>     read.[1]
>=20
> In the case that solution (1) is chosen, I think, we should also have
> something like `strictReadFile' (and
> `hStrictGetContents') which reads the whole file before proceeding to
> the next IO action.  Otherwise, in situations like in the mentioned
> assignment, you have to resort to reading the file character by
> character, which seems very awkward.
>=20
> So, overall, I think solution (2) is more elegant.
>=20
> Cheers,
> Manuel
>=20
> [1] On Unix-like (POSIX?) systems, unlinking the file and
>     then opening the writable file would be sufficient.  On
>     certain legacy OSes, the implementation would have to
>     read the rest of the file into memory before creating
>     a new file under the same name.
>=20
> _______________________________________________
> Libraries mailing list
> Libraries@haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
>=20