[Haskell-cafe] Lazy IO and closing of file handles

Donald Bruce Stewart dons at cse.unsw.edu.au
Thu Mar 15 08:47:41 EDT 2007


claus.reinke:
> >Not necessarily so, since you are making assumptions about the
> >timeliness of garbage collection. I was similarly sceptical of Claus'
> >suggestion:
> >
> >Claus Reinke:
> >>in order to keep the overall structure, one could move readFile backwards
> >>and parseEmail forwards in the pipeline, until the two meet. then make 
> >>sure
> >>that parseEmail completely constructs the internal representation of each
> >>email, thereby keeping no implicit references to the external 
> >>representation.
> 
> you are quite right to be skeptical!-) indeed, in the latest Handle 
> documentation, we still find the following excuse for GHC:
> 
> http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html#t%3AHandle
> 
>    GHC note: a Handle will be automatically closed when the garbage 
>    collector detects that it has become unreferenced by the program. 
>    However, relying on this behaviour is not generally recommended: the 
>    garbage collector is unpredictable. If possible, use explicit an 
>    explicit hClose to close Handles when they are no longer required. GHC 
>    does not currently attempt to free up file descriptors when they have 
>    run out, it is your responsibility to ensure that this doesn't happen. 
> this issue has been discussed in the past, and i consider it a bug if the 
> memory
> manager tells me to handle memory myself;-) so i do hope that this 
> infelicity will
> be removed in the future (run out of file descriptors -> run a garbage 
> collection
> and try again, before giving up entirely).
> 
> in fact, my local version had two variants of processFile - the one i 
> posted and
> one with explicit file handle handling (the code was restructured this way 
> exactly
> to hide this implementation decision in a single function). i did test both 
> variants
> on a directory with lots of copies of a few emails (>2000 files), and both 
> worked
> on my system, so i hoped -rather than checked- that the handle collection 
> issue
> had finally been fixed, and made the mistake of removing the more complex
> variant before posting. thanks for pointing out that error - as the 
> documentation
> above demonstrates, it isn't good to rely on assumptions, nor on tests.
> 
> so here is the alternate variant of processFile (for which i imported 
> System.IO):
> 
> >processFile path = do
> >  f <- openFile path ReadMode
> >  text <- hGetContents f
> >  let email = parseEmail text
> >  email `seq` hClose f
> >  return email
> 
> all this hazzle to expose a file handle to call hClose on, just so that the 
> GC does not have to..
> 

Are we at the point that we should consider adding some documentation
how to deal with this issue? And are the recommendations to either use
strict IO (should we have a package for System.IO.Strict??), or via
strictness on the consumer of the data.

-- Don


More information about the Haskell-Cafe mailing list