[Haskell-beginners] too lazy parsing?
Daniel Fischer
daniel.is.fischer at googlemail.com
Mon Feb 4 19:50:20 CET 2013
On Monday 04 February 2013, 11:50:24, Kees Bleijenberg wrote:
> module Main where
>
> import Text.ParserCombinators.Parsec (many,many1,string, Parser, parse)
> import System.IO (IOMode(..),hClose,openFile,hGetContents,hPutStrLn)
>
> parseFile hOut fn = do
>
> handle <- openFile fn ReadMode
> cont <- hGetContents handle
hGetContents does read the file only on demand
> print cont
Here the contents is demanded, so the file is read.
> let res = parse (many (string "blah")) "" cont
The binding of res is lazy, so it's not demanded yet, without the printing,
the file would still not be read.
> hClose handle
Now the file handle is closed, so nothing more can be read from the file. If
nothing has been demanded so far, cont will be an empty string.
1. You should rather use readFile, unless you need to read a lot of files, in
which case opening too many at once may exhaust the available file handles; the
you need strict IO with exact control over when a file is opened, read, and
closed.
readFile only semi-closes the file handle, reading from the file still works
until the contents goes out of scope or the end of the file is reached. Also
readFile leads to simpler code,
cont <- readFile fn
let res = parse (many (string "blah")) "" cont
case res of
...
2. if you absolutely want to use the more cumbersome hOpen - hGetContents -
hClose sequence, you need to force the file contents to be read before closing
the file. Instead of printing the contents,
let res = ...
res `seq` hClose handle
would work here, generally, to ensure the entire file was read, a common way is
length cont `seq` hClose handle
But in those cases, it would probably make more sense to use strict IO anyway,
rather than lazy IO.
3. The only reason to use hOpen - hGetContents - hClose instead of readFile is
more exception-safety, in which case you should use bracket (from
Control.Exception), or the wrapper around bracket
withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r
> case res of
>
> (Left err) -> hPutStrLn hOut $ "Error: " ++
> (show err)
> (Right goodRes) -> mapM_ (hPutStrLn hOut)
> goodRes
>
> main = do
>
> hOut <- openFile "outp.txt" WriteMode
> mapM (parseFile hOut) ["inp.txt"]
> hClose hOut
>
> I’am writing a program that parses a lot of files. Above is the simplest
> program I can think of that demonstrates my problem. The program above
> parses inp.txt. Inp.txt has only the word blah in it. The output is saved
> in outp.txt. This file contains the word blah after running the program. if
> I comment out the line ‘print cont’ nothing is saved in outp.txt. If I
> comment out ‘print cont’ and replace many with many1 in the following line,
> it works again?
Hmm, I get (as expected)
Error: (line 1, column 1):
unexpected end of input
expecting "blah"
with many1.
When you parse an empty string with `many parser`, the parse succeeds and
returns an empty list. Successively printing an empty list of strings means
printing nothing at all.
Using `many1` instead of `many` makes the parse fail, and then you print the
error, thus `many1 (string "blah") produces some output from an empty file [and
without forcing the file contents in some way before closing, the file is
effectively empty].
> Can someone explain to me what is going on?
>
> Kees
More information about the Beginners
mailing list