[Haskell-cafe] RE: readFile and closing a file

Mon Sep 22 06:48:41 EDT 2008

> On Wed, 17 Sep 2008, Mitchell, Neil wrote:
>
>> I tend to use openFile, hGetContents, hClose - your initial readFile
>> like call should be openFile/hGetContents, which gives you a lazy
>> stream, and on a parse error call hClose.
>
> I could use a function like
>   withReadFile :: FilePath -> (Handle -> IO a) -> IO a
>   withReadFile name action = bracket openFile hClose ...
>
> Then, if 'action' fails, the file can be properly closed. However, there
> is still a problem: Say, 'action' is a parser which produces a data
> structure lazily. Then further processing of that data structure of type
> 'a' may again stop before completing the whole structure, which would also
> leave the file open. We have to force users to do all processing within
> 'action' and to only return strict values. But how to do this?

I used rnf from Control.Parallel.Strategies when dealing with a
similar problem.  Would it work in your case?

To merge discussion from a related thread:

IMO, the question is how much should a language/library prevent the
user from shooting himself in the foot?  The biggest problem with lazy
IO, IMO, is that it presents several opportunities to do so.  The
three biggest causes I've dealt with are handle leaks, insufficiently
evaluated data structures, and problems with garbage collection as in
the naive 'mean xs = sum xs / length xs' implementation.

There are some idioms that can help with the first two cases, namely
the 'with*' paradigm and 'rnf', but the third problem requires that
the programmer know how stuff works to avoid poor implementations.
While that's not bad per se, in some cases I think it's far too easy
for the unwitting, or even the slightly distracted, to get caught in
traps.

I'm facing a design decision ATM related to this.  I can use something
like lazy bytestrings, in which the chunkiness and laziness is reified
into the datastructure, or an Iterator-style fold for consuming data.
The advantage of the former approach is that it's well understood by
most users and has proven good performance, while on the downside I
could see it easily leading to memory exhaustion.  I think the problem
with lazy bytestrings, in particular, is that the foldChunks is so
well hidden from most consumers that it's easy to hold references that
prevent consumed chunks from being reclaimed by the GC.  When dealing
with data in the hundreds of MBs, or GB range, this is a problem.

An Enumerator, on the other hand, makes the fold explicit, so users
are required to think about the best way to consume data.  It's much
harder to unintentionally hold references.  This is quite appealing.
Based on my own tests so far performance is certainly competitive.
Assuming a good implementation, handle leaks can also be prevented.
On the downside, it's a very poor model if random access is required,
and users aren't as familiar with it, in addition to some of the
questions Don raises.

Back onto the topic at hand - 'action' could be a parser passed into
an enumerator.  Since it would read strictly, the action could end the
read when it has enough data.  That's another approach that I think
would work with your problem.

Well, that's my 2cents.

John Lato