[Haskell-cafe] Lazy IO and closing of file handles

Ketil Malde Ketil.Malde at bccs.uib.no
Thu Mar 15 03:39:36 EDT 2007


Donald Bruce Stewart wrote:
> pete-expires-20070513:
>   
>> When using readFile to process a large number of files, I am exceeding
>> the resource limits for the maximum number of open file descriptors
This is very annoying - I can't see any good reason why file descriptors 
should "run out" (before memory is exhausted).  I guess the Linux kernel 
is intended for imperative use :-/
> Read in data strictly, and there are two obvious ways to do that:
>
>     -- Via strings [..]
>     -- Via ByteStrings [..]
Perhaps this is an esoteric way, but I think the nicest approach is to 
parse into a strict structure.  If you fully evaluate each Email (or 
whatever structure you parse into), there will be no unevaluated thunks 
linking to the file, and it will be closed.

If the files are small (e.g. maildir or similar with one email in 
each?), you can use strict ByteString, but I generally use lazy 
ByteStrings for just about anything.   Be aware that extracting a 
substring from a ByteString is performed by "slicing", so it keeps a 
pointer to the original string (along with offset and length).  For 
strict ByteStrings, this would keep everything in memory, for lazy 
ByteStrings, you'd keep only the relevant chunks (so that would allow 
the body to be GC'ed, if you aren't interested in keeping it).

(I wonder if the garbage collector could somehow discover strings that 
have been sliced down a lot, and copy only the relevant parts?)

-k


More information about the Haskell-Cafe mailing list