[Haskell-cafe] Re: Lazy IO and closing of file handles
Pete Kazmier
pete-expires-20070513 at kazmier.com
Mon Mar 19 12:13:27 EDT 2007
"Matthew Brecknell" <haskell at brecknell.org> writes:
> Pete Kazmier:
>> I attempted to read Oleg's fold-stream implementation [1] as this
>> sounds quite appealing to me, but I was completely overwhelmed,
>> especially with all of the various type signatures used. It would be
>> great if one of the regular Haskell bloggers (Tom Moertel are you
>> reading this?) might write a blog entry or two interpreting his
>> implementation for those of us starting out in Haskell perhaps by
>> starting out with a non-polymorphic version so as to emphasize the
>> approach.
>>
>> [1] http://okmij.org/ftp/Haskell/fold-stream.lhs
>
> The basic idea of the paper is the use of a left-fold operator as the
> primary interface for enumarating collections. The recursive version
> (less general than the non-recursive version) of a left-fold operator
> for enumerating the lines of a text file might look something like this:
>
>> import Control.Monad.Fix
>> import Control.Exception
>> import Data.List
>> import qualified Data.Set as S
>> import qualified Data.Map as M
>> import System.IO
>>
>> enumLines :: (a -> String -> Either a a) -> a -> FilePath -> IO a
>> enumLines iter accum filename = do
>> h <- openFile filename ReadMode
>> flip fix accum $
>> \iterate accum -> do
>> try_line <- try (hGetLine h)
>> case try_line of
>> Left e -> hClose h >> return accum
>> Right line -> do
>> case iter accum line of
>> Left accum -> hClose h >> return accum
>> Right accum -> iterate accum
I understand the intent of this code, but I am having a hard time
understanding the implementation, specifically the combination of
'fix', 'flip', and 'interate'. I looked up 'fix' and I'm unsure how
one can call 'flip' on a function that takes one argument.
> To use this, you provide an "iteratee", a function which takes an
> accumulator and a line from the file, and returns a new accumulator
> embedded in an Either. Using the Left branch causes immediate
> termination of the enumeration. For example, to search for the first
> occurrence of each of a set of email headers:
>
>> getHeaders :: S.Set String -> FilePath -> IO (S.Set String, M.Map String String)
>> getHeaders hdrs file = enumLines findHdrs (hdrs,M.empty) file where
>> findHdrs accum@(wanted,found) line =
>> if null line
>> then Left accum
>> else
>> case headerLine line of
>> Nothing -> Right accum
>> Just hdr ->
>> case findDelete hdr wanted of
>> Nothing -> Right accum
>> Just wanted ->
>> let accum = (wanted, M.insert hdr line found) in
>> if S.null wanted
>> then Left accum
>> else Right accum
>>
>> headerLine :: String -> Maybe String
>> headerLine (':':xs) = Just []
>> headerLine (x:xs) = fmap (x:) (headerLine xs)
>> headerLine [] = Nothing
>>
>> findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a)
>> findDelete e s = if S.member e s
>> then Just (S.delete e s)
>> else Nothing
>
> It's a bit of a case-analysis nightmare, but when comparing this to
> previous approaches, note that file traversal and processing are cleanly
> separated, file handle closure is guaranteed to be timely, file
> traversal stops as soon as all the required headers have been found,
> memory usage is minimised.
Very nice. I like the clean separation, but as you say, its one ugly
bit of code compared to my original code, although much more elegant
no doubt.
> I hope that helps.
Very much so. Thank you for you help.
More information about the Haskell-Cafe
mailing list