[Haskell-cafe] Lazy IO and closing of file handles
Donald Bruce Stewart
dons at cse.unsw.edu.au
Thu Mar 15 08:47:41 EDT 2007
claus.reinke:
> >Not necessarily so, since you are making assumptions about the
> >timeliness of garbage collection. I was similarly sceptical of Claus'
> >suggestion:
> >
> >Claus Reinke:
> >>in order to keep the overall structure, one could move readFile backwards
> >>and parseEmail forwards in the pipeline, until the two meet. then make
> >>sure
> >>that parseEmail completely constructs the internal representation of each
> >>email, thereby keeping no implicit references to the external
> >>representation.
>
> you are quite right to be skeptical!-) indeed, in the latest Handle
> documentation, we still find the following excuse for GHC:
>
> http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html#t%3AHandle
>
> GHC note: a Handle will be automatically closed when the garbage
> collector detects that it has become unreferenced by the program.
> However, relying on this behaviour is not generally recommended: the
> garbage collector is unpredictable. If possible, use explicit an
> explicit hClose to close Handles when they are no longer required. GHC
> does not currently attempt to free up file descriptors when they have
> run out, it is your responsibility to ensure that this doesn't happen.
> this issue has been discussed in the past, and i consider it a bug if the
> memory
> manager tells me to handle memory myself;-) so i do hope that this
> infelicity will
> be removed in the future (run out of file descriptors -> run a garbage
> collection
> and try again, before giving up entirely).
>
> in fact, my local version had two variants of processFile - the one i
> posted and
> one with explicit file handle handling (the code was restructured this way
> exactly
> to hide this implementation decision in a single function). i did test both
> variants
> on a directory with lots of copies of a few emails (>2000 files), and both
> worked
> on my system, so i hoped -rather than checked- that the handle collection
> issue
> had finally been fixed, and made the mistake of removing the more complex
> variant before posting. thanks for pointing out that error - as the
> documentation
> above demonstrates, it isn't good to rely on assumptions, nor on tests.
>
> so here is the alternate variant of processFile (for which i imported
> System.IO):
>
> >processFile path = do
> > f <- openFile path ReadMode
> > text <- hGetContents f
> > let email = parseEmail text
> > email `seq` hClose f
> > return email
>
> all this hazzle to expose a file handle to call hClose on, just so that the
> GC does not have to..
>
Are we at the point that we should consider adding some documentation
how to deal with this issue? And are the recommendations to either use
strict IO (should we have a package for System.IO.Strict??), or via
strictness on the consumer of the data.
-- Don
More information about the Haskell-Cafe
mailing list