Pure File Reading (was: Dealing with configuration data)

Koen Claessen koen@cs.chalmers.se
Thu, 26 Sep 2002 09:07:57 +0200 (MET DST)


Dear all,

At the moment, a discussion on haskell-cafe is going on
about how to neatly program the fact that an entire program
depends on a number of parameters that are read in once at
the beginning of a program.

The suggestion that many people came up with was using
unsafePerformIO in the beginning to read the file. Here is
my version of that:

 | data Configuration = ...  -- config data
 |
 | getConfig :: Configuration
 | getConfig = unsafePerformIO $
 |   do ...read configuration from file...
 |      return configuration
 |
 | main =
 |   do doStuff

It is quite disturbing that there is no other easy way to do
this than using unsafePerformIO (except for using implicit
parameters perhaps, but there are other reasons for not
using those).

I have been thinking a little bit more about this and here
is what I found.

Remember the Gofer days, when Gofer had a "function":

  openFile :: FilePath -> String

This was of course a cheap and dirty way of implementing
things like the getConfig above, but it is impure. However,
one could imagine a functional version of this function:

  readFileOnce :: FilePath -> Maybe String

This function will read the contents of the file (and return
Nothing if something went wrong), but it is memoized, so
that the second time you use this function you get the same
result.

So, it is a pure function. (Admittedly, it is somewhat
unpredictable, but you will always get the same result for
the same arguments.) It is no more strange than GHC's pure
version of the getArgs function (I forgot what it was/is
called).

How about space behavior, you say? Reading a file, and
memoizing the result means storing the whole contents of the
file in memory!

The point is that the use of this function will typically
happen at the beginning of a program, when reading the
configuration file(s). When all this has happened, the
function readFileOnce, and its memo table, will be garabage
collected. (Of course there is no guarantee that all calls
to readFileOnce will be evaluated at the beginning of a
program, and it is not required, but when you do, there are
no space problems.)

There could of course be pure "-Once" versions of other IO
operations. Here is a list of possibilities:

  - reading a file
  - getting arguments
  - getting environment variables
  - downloading a webpage
  - ...

What do you think?

Regards,
/Koen.

--
Koen Claessen
http://www.cs.chalmers.se/~koen
Chalmers University, Gothenburg, Sweden.