[Haskell-cafe] Lazy IO and closing of file handles
Ketil Malde
Ketil.Malde at bccs.uib.no
Thu Mar 15 03:39:36 EDT 2007
Donald Bruce Stewart wrote:
> pete-expires-20070513:
>
>> When using readFile to process a large number of files, I am exceeding
>> the resource limits for the maximum number of open file descriptors
This is very annoying - I can't see any good reason why file descriptors
should "run out" (before memory is exhausted). I guess the Linux kernel
is intended for imperative use :-/
> Read in data strictly, and there are two obvious ways to do that:
>
> -- Via strings [..]
> -- Via ByteStrings [..]
Perhaps this is an esoteric way, but I think the nicest approach is to
parse into a strict structure. If you fully evaluate each Email (or
whatever structure you parse into), there will be no unevaluated thunks
linking to the file, and it will be closed.
If the files are small (e.g. maildir or similar with one email in
each?), you can use strict ByteString, but I generally use lazy
ByteStrings for just about anything. Be aware that extracting a
substring from a ByteString is performed by "slicing", so it keeps a
pointer to the original string (along with offset and length). For
strict ByteStrings, this would keep everything in memory, for lazy
ByteStrings, you'd keep only the relevant chunks (so that would allow
the body to be GC'ed, if you aren't interested in keeping it).
(I wonder if the garbage collector could somehow discover strings that
have been sliced down a lot, and copy only the relevant parts?)
-k
More information about the Haskell-Cafe
mailing list