[Haskell-cafe] Re: Lazy IO and closing of file handles

Wed Mar 14 19:15:43 EDT 2007

pete-expires-20070513:
> dons at cse.unsw.edu.au (Donald Bruce Stewart) writes:
> 
> > pete-expires-20070513:
> >> When using readFile to process a large number of files, I am exceeding
> >> the resource limits for the maximum number of open file descriptors on
> >> my system.  How can I enhance my program to deal with this situation
> >> without making significant changes?
> >
> > Read in data strictly, and there are two obvious ways to do that:
> >
> >     -- Via strings:
> >
> >     readFileStrict f = do
> >         s <- readFile f
> >         length s `seq` return s
> >
> >     -- Via ByteStrings
> >     readFileStrict  = Data.ByteString.readFile
> >     readFileStrictString  = liftM Data.ByteString.unpack Data.ByteString.readFile
> >
> > If you're reading more than say, 100k of data, I'd use strict
> > ByteStrings without hesitation. More than 10M, and I'd use lazy
> > bytestrings.
> 
> Correct me if I'm wrong, but isn't this exactly what I wanted to
> avoid?  Reading the entire file into memory?  In my previous email, I
> was trying to state that I wanted to lazily read the file because some
> of the files are quite large and there is no reason to read beyond the
> small set of headers.  If I read the entire file into memory, this
> design goal is no longer met.
> 
> Nevertheless, I was benchmarking with ByteStrings (both lazy and
> strict), and in both cases, the ByteString versions of readFile yield
> the same error regarding max open files.  Incidentally, the lazy
> bytestring version of my program was by far the fastest and used the
> least amount of memory, but it still crapped out regarding max open
> files. 
> 
> So I'm back to square one.  Any other ideas?

Hmm. Ok. So we need to have more hClose's happen somehow. Can you
process files one at a time?

-- Don