[Haskell-cafe] Lazy IO and closing of file handles

Claus Reinke claus.reinke at talk21.com
Thu Mar 15 08:43:11 EDT 2007


> Not necessarily so, since you are making assumptions about the
> timeliness of garbage collection. I was similarly sceptical of Claus'
> suggestion:
> 
> Claus Reinke:
>> in order to keep the overall structure, one could move readFile backwards
>> and parseEmail forwards in the pipeline, until the two meet. then make sure
>> that parseEmail completely constructs the internal representation of each
>> email, thereby keeping no implicit references to the external representation.
 
you are quite right to be skeptical!-) indeed, in the latest Handle documentation, 
we still find the following excuse for GHC:

http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html#t%3AHandle

    GHC note: a Handle will be automatically closed when the garbage collector 
    detects that it has become unreferenced by the program. However, relying on 
    this behaviour is not generally recommended: the garbage collector is unpredictable. 
    If possible, use explicit an explicit hClose to close Handles when they are no longer 
    required. GHC does not currently attempt to free up file descriptors when they have 
    run out, it is your responsibility to ensure that this doesn't happen. 

this issue has been discussed in the past, and i consider it a bug if the memory
manager tells me to handle memory myself;-) so i do hope that this infelicity will
be removed in the future (run out of file descriptors -> run a garbage collection
and try again, before giving up entirely).

in fact, my local version had two variants of processFile - the one i posted and
one with explicit file handle handling (the code was restructured this way exactly
to hide this implementation decision in a single function). i did test both variants
on a directory with lots of copies of a few emails (>2000 files), and both worked
on my system, so i hoped -rather than checked- that the handle collection issue
had finally been fixed, and made the mistake of removing the more complex
variant before posting. thanks for pointing out that error - as the documentation
above demonstrates, it isn't good to rely on assumptions, nor on tests.

so here is the alternate variant of processFile (for which i imported System.IO):

> processFile path = do
>   f <- openFile path ReadMode
>   text <- hGetContents f
>   let email = parseEmail text
>   email `seq` hClose f
>   return email

all this hazzle to expose a file handle to call hClose on, just so that the GC 
does not have to..

thanks,
claus



More information about the Haskell-Cafe mailing list