[Haskell-beginners] too lazy parsing?

Daniel Fischer daniel.is.fischer at googlemail.com
Mon Feb 4 19:50:20 CET 2013


On Monday 04 February 2013, 11:50:24, Kees Bleijenberg wrote:

> module Main where
> 
> import Text.ParserCombinators.Parsec (many,many1,string, Parser, parse)
> import System.IO (IOMode(..),hClose,openFile,hGetContents,hPutStrLn)
> 
> parseFile hOut fn = do
> 
>                         handle <- openFile fn ReadMode
>                         cont <- hGetContents handle

hGetContents does read the file only on demand

>                         print cont

Here the contents is demanded, so the file is read. 

>                         let res = parse (many (string "blah")) "" cont

The binding of res is lazy, so it's not demanded yet, without the printing, 
the file would still not be read.

>                         hClose handle

Now the file handle is closed, so nothing more can be read from the file. If 
nothing has been demanded so far, cont will be an empty string.

1. You should rather use readFile, unless you need to read a lot of files, in 
which case opening too many at once may exhaust the available file handles; the 
you need strict IO with exact control over when a file is opened, read, and 
closed.

readFile only semi-closes the file handle, reading from the file still works 
until the contents goes out of scope or the end of the file is reached. Also 
readFile leads to simpler code,

    cont <- readFile fn
    let res = parse (many (string "blah")) "" cont
    case res of
      ...

2. if you absolutely want to use the more cumbersome hOpen - hGetContents -
hClose sequence, you need to force the file contents to be read before closing 
the file. Instead of printing the contents,

    let res = ...
    res `seq` hClose handle

would work here, generally, to ensure the entire file was read, a common way is

    length cont `seq` hClose handle

But in those cases, it would probably make more sense to use strict IO anyway, 
rather than lazy IO.

3. The only reason to use hOpen - hGetContents - hClose instead of readFile is 
more exception-safety, in which case you should use bracket (from 
Control.Exception), or the wrapper around bracket

    withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r

>                         case res of
>                         
>                             (Left err) -> hPutStrLn hOut $ "Error: " ++
>                             (show err)
>                             (Right goodRes) -> mapM_ (hPutStrLn hOut)
>                             goodRes
> 
> main = do
> 
>             hOut <- openFile "outp.txt" WriteMode
>             mapM (parseFile hOut) ["inp.txt"]
>             hClose hOut
> 
> I’am writing a program that parses a lot of files. Above is the simplest
> program I can think of that demonstrates my problem. The program above
> parses inp.txt.  Inp.txt has only the word blah in it.  The output is saved
> in outp.txt. This file contains the word blah after running the program. if
> I comment out the line ‘print cont’ nothing is saved in outp.txt. If I
> comment out ‘print cont’ and replace many with many1 in the following line,
> it works again?

Hmm, I get (as expected)

Error: (line 1, column 1):
unexpected end of input
expecting "blah"

with many1.

When you parse an empty string with `many parser`, the parse succeeds and 
returns an empty list. Successively printing an empty list of strings means 
printing nothing at all.

Using `many1` instead of `many` makes the parse fail, and then you print the 
error, thus `many1 (string "blah") produces some output from an empty file [and 
without forcing the file contents in some way before closing, the file is 
effectively empty].

> Can someone explain to me what is going  on?
> 
> Kees





More information about the Beginners mailing list