FW: lazy file reading in H98
Simon Peyton-Jones
simonpj@microsoft.com
Tue, 3 Apr 2001 04:00:55 -0700
Here's a library issue.
The conclusion of this conversation was that H98 already specifies
option (1) below, and I will clarify that in revising the library
report.
Nevertheless, the absence of a simple way to read-modify-write a file
is a pain in the neck.=20
Question: should one of our extended-IO libraries support a version of
openFile that guarantees option (2)?
Simon
-----Original Message-----
From: Manuel M. T. Chakravarty [mailto:chak@cse.unsw.edu.au]=20
Sent: 05 September 2000 02:10
To: haskell@haskell.org
Subject: lazy file reading in H98
In an assignment, in my class, we came across a lack of specification of
the behaviour of `Prelude.readFile' and `IO.hGetContents' and IMHO also
a lack of functionality. As both operations read a file lazily,
subsequent writes to the same file are potentially disastrous. In this
assignment, the file was used to make a Haskell data structure
persistent over multiple runs of the program - ie,=20
readFile fname >>=3D return . read
at the start of the program and
writeFile fname . show
at the end of the program. For certain inputs, where the
data structure stored in the file was only partially used,
the file was overwritten before it was fully read.
H98 doesn't really specify what happens in this situation.
I think, there are two ways to solve that:
(1) At least, the definition should say that the behaviour
is undefined if a program every writes to a file that it
has read with `readFile' or `hGetContents' before.
(2) Alternatively, it could demand more sophistication from
the implementation and require that upon opening of a
file for writing that is currently semi-closed, the
implementation has to make sure that the contents of the
semi-closed file is not corrupted before it is fully
read.[1]
In the case that solution (1) is chosen, I think, we should also have
something like `strictReadFile' (and
`hStrictGetContents') which reads the whole file before proceeding to
the next IO action. Otherwise, in situations like in the mentioned
assignment, you have to resort to reading the file character by
character, which seems very awkward.
So, overall, I think solution (2) is more elegant.
Cheers,
Manuel
[1] On Unix-like (POSIX?) systems, unlinking the file and
then opening the writable file would be sufficient. On
certain legacy OSes, the implementation would have to
read the rest of the file into memory before creating
a new file under the same name.