[Haskell-cafe] Lazy IO and closing of file handles
Claus Reinke
claus.reinke at talk21.com
Tue Mar 20 07:44:56 EDT 2007
>[left-fold operator for enumerating the lines of a text file]
..
>> enumLines :: (a -> String -> Either a a) -> a -> FilePath -> IO a
>> enumLines iter accum filename = do
>> h <- openFile filename ReadMode
>> flip fix accum $
>> \iterate accum -> do
>> try_line <- try (hGetLine h)
>> case try_line of
>> Left e -> hClose h >> return accum
>> Right line -> do
>> case iter accum line of
>> Left accum -> hClose h >> return accum
>> Right accum -> iterate accum
..
>> getHeaders :: S.Set String -> FilePath -> IO (S.Set String, M.Map String String)
>> getHeaders hdrs file = enumLines findHdrs (hdrs,M.empty) file where
we can keep the Left/Right implicit, using either:
getHeaders1 hdrs file = enumLines findHdrs (hdrs,M.empty) file
where
enumLines iter accum filename = do
h <- openFile filename ReadMode
flip fix accum $ \iterate accum ->
join $ (fmap (either
(const $ hClose h >> return accum)
(either ((hClose h >>) . return) iterate . iter accum)))
(try (hGetLine h))
or extract the reusable loop-with-exit-by-either functionality:
loopME m stop continue acc = m >>= either (stop acc) (continue (loopME m stop continue) acc)
getHeaders2 hdrs file = enumLines findHdrs (hdrs,M.empty) file
where
enumLines iter accum f = do
h <- openFile f ReadMode
loopME (try (hGetLine h))
(\acc left->hClose h >> return acc)
(\loop acc right->either ((hClose h >>) . return) loop (iter acc right))
accum
or sneak some lazy i/o back in, using a fold-with-exit-by-either, similar to loopME:
withFile path m = bracket (openFile path ReadMode) hClose m
withContentsOf path f = withFile path ((((return $!) . f ) =<<) . hGetContents)
withLinesOf path f = withContentsOf path (f . lines)
foldE f a [] = a
foldE f a (x:xs) = either id (\a'->foldE f a' xs) (f a x)
getHeaders3 hdrs file = withLinesOf file (foldE findHdrs (hdrs,M.empty))
> To use this, you provide an "iteratee", a function which takes an
> accumulator and a line from the file, and returns a new accumulator
> embedded in an Either. Using the Left branch causes immediate
> termination of the enumeration. For example, to search for the first
> occurrence of each of a set of email headers:
>> findHdrs accum@(wanted,found) line =
>> if null line
>> then Left accum
>> else
>> case headerLine line of
>> Nothing -> Right accum
>> Just hdr ->
>> case findDelete hdr wanted of
>> Nothing -> Right accum
>> Just wanted ->
>> let accum = (wanted, M.insert hdr line found) in
>> if S.null wanted
>> then Left accum
>> else Right accum
>>
>> headerLine :: String -> Maybe String
>> headerLine (':':xs) = Just []
>> headerLine (x:xs) = fmap (x:) (headerLine xs)
>> headerLine [] = Nothing
>>
>> findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a)
>> findDelete e s = if S.member e s
>> then Just (S.delete e s)
>> else Nothing
>
> It's a bit of a case-analysis nightmare
indeed, this part can be cleaned up considerably, using Monad Maybe:
findHdrs accum@(wanted,found) line =
if null line || S.null wanted
then Left accum
else maybe (Right accum) id $ do
(field,value) <- headerLine line
wanted' <- findDelete field wanted
let found' = M.insert field value found
return $! (Right $! ((,) $! wanted') $! found')
headerLine :: String -> Maybe (String,String)
headerLine xs = do (field,':':value) <- return (span (/=':') xs)
let value' = dropWhile isSpace value
return $! ((,) $! field) $! strictly value'
findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a)
findDelete e s = guard (S.member e s) >> return (S.delete e s)
strictly l = length l `seq` l
running the three variants over a moderately sized directory (>3k emails, one
including a hugs-tarball;-), 1/2 are roughly equivalent, but Hugs claims that 3
allocates less and needs fewer garbage collections than 1/2, while GHC claims
that it is the other way round..
claus
More information about the Haskell-Cafe
mailing list