[Haskell-cafe] Lazy IO and closing of file handles
Matthew Brecknell
haskell at brecknell.org
Sun Mar 18 22:02:53 EDT 2007
Pete Kazmier:
> I attempted to read Oleg's fold-stream implementation [1] as this
> sounds quite appealing to me, but I was completely overwhelmed,
> especially with all of the various type signatures used. It would be
> great if one of the regular Haskell bloggers (Tom Moertel are you
> reading this?) might write a blog entry or two interpreting his
> implementation for those of us starting out in Haskell perhaps by
> starting out with a non-polymorphic version so as to emphasize the
> approach.
>
> [1] http://okmij.org/ftp/Haskell/fold-stream.lhs
The basic idea of the paper is the use of a left-fold operator as the
primary interface for enumarating collections. The recursive version
(less general than the non-recursive version) of a left-fold operator
for enumerating the lines of a text file might look something like this:
> import Control.Monad.Fix
> import Control.Exception
> import Data.List
> import qualified Data.Set as S
> import qualified Data.Map as M
> import System.IO
>
> enumLines :: (a -> String -> Either a a) -> a -> FilePath -> IO a
> enumLines iter accum filename = do
> h <- openFile filename ReadMode
> flip fix accum $
> \iterate accum -> do
> try_line <- try (hGetLine h)
> case try_line of
> Left e -> hClose h >> return accum
> Right line -> do
> case iter accum line of
> Left accum -> hClose h >> return accum
> Right accum -> iterate accum
It needs better exception handling, including a mechanism to pass a
non-EOF exception back to the caller without losing the value
accumulated up to the point of the exception. Perhaps it might also be
useful to generalise the first parameter to an IO action.
To use this, you provide an "iteratee", a function which takes an
accumulator and a line from the file, and returns a new accumulator
embedded in an Either. Using the Left branch causes immediate
termination of the enumeration. For example, to search for the first
occurrence of each of a set of email headers:
> getHeaders :: S.Set String -> FilePath -> IO (S.Set String, M.Map String String)
> getHeaders hdrs file = enumLines findHdrs (hdrs,M.empty) file where
> findHdrs accum@(wanted,found) line =
> if null line
> then Left accum
> else
> case headerLine line of
> Nothing -> Right accum
> Just hdr ->
> case findDelete hdr wanted of
> Nothing -> Right accum
> Just wanted ->
> let accum = (wanted, M.insert hdr line found) in
> if S.null wanted
> then Left accum
> else Right accum
>
> headerLine :: String -> Maybe String
> headerLine (':':xs) = Just []
> headerLine (x:xs) = fmap (x:) (headerLine xs)
> headerLine [] = Nothing
>
> findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a)
> findDelete e s = if S.member e s
> then Just (S.delete e s)
> else Nothing
It's a bit of a case-analysis nightmare, but when comparing this to
previous approaches, note that file traversal and processing are cleanly
separated, file handle closure is guaranteed to be timely, file
traversal stops as soon as all the required headers have been found,
memory usage is minimised.
I hope that helps.
More information about the Haskell-Cafe
mailing list