[Haskell-cafe] Re: Lazy IO and closing of file handles

Pete Kazmier pete-expires-20070513 at kazmier.com
Sat Mar 17 11:21:46 EDT 2007


"Matthew Brecknell" <haskell at brecknell.org> writes:

> So here's a test. I don't have any big maildirs handy, so this is based
> on the simple exercise of printing the first line of each of a large
> number of files. First, the preamble.
>
>> import Control.Exception (bracket)
>> import System.Environment
>> import System.IO
>
>> main = do
>>   t:n:fs <- getArgs
>>   ([test0,test1,test2,test3] !! read t) (take (read n) $ cycle fs)
> 
> [snip]

Thank you for summarizing the approaches presented by others.  As a
Haskell newbie, there seems to be quite a few esoteric concepts to
conquer.  Your concrete examples were helpful in my understanding of
the ramifications associated with the various approaches.

After reading the various threads you cited, I decided to avoid lazy
IO altogether.  By using 'readFile' without forcing the strict
evaluation of my parser, I inadvertently relinquished control of the
resource management--closing of the file handles was left to the GC.
And although I could have used 'seq' to address the issue, why bother
fixing a problem that could have been avoided altogther by using
strict IO.

With that said, I added the following function to my program and then
replaced the invocation of 'readFile' with it:

  readEmailHeaders :: FilePath -> IO String
  readEmailHeaders file = 
      bracket (openFile file ReadMode) (hClose) (headers [])
      where
        headers acc h = do
            line <- hGetLine h
            case line of
              -- Stop reading file once we hit the empty separator
              -- line, no need to read the rest of the file (body).
              "" -> return . concat . reverse $ acc
              _  -> headers ("\n":line:acc) h

I'm not sure if this is the best implementation, but the speed is
comparable to the lazy IO version without the annoying defect of
running out of file handles.  I also tried an implementation using
'hGetChar' but that was much slower.

I attempted to read Oleg's fold-stream implementation [1] as this
sounds quite appealing to me, but I was completely overwhelmed,
especially with all of the various type signatures used.  It would be
great if one of the regular Haskell bloggers (Tom Moertel are you
reading this?) might write a blog entry or two interpreting his
implementation for those of us starting out in Haskell perhaps by
starting out with a non-polymorphic version so as to emphasize the
approach.

Thanks,
Pete

[1] http://okmij.org/ftp/Haskell/fold-stream.lhs



More information about the Haskell-Cafe mailing list