hGetContents and laziness in file io

Hal Daume hdaume@ISI.EDU
Mon, 23 Jul 2001 21:00:28 -0700 (PDT)


I have a few laziness/performace questions regarding file io in haskell
(particularly hugs, right now).

I'm writing a program that basically converts file formats.  The files
are parse trees for natural language.  So I read in one of the parse
trees in the original format and write it out again in my own format. 
I'm doing this is haskell because a) i like haskell, b) writing parsers
in haskell is really easy.

I'm currently using hGetContents to read the original file in, though
I've also tried using readFile and the like.

My basic problem is that the program takes a ridiculous amount of time
to run and that it often runs out of heap.

The file that I read in is basically just a sequence of parse trees, one
after the other.  I would *like* to basically read one in, parse it,
convert it, write that to the output file.  Read the next in, parse it,
convert it, write it.  Etc.  The problem is that it seems that my
program is trying to load the entire file into memory for just reading
one tree.

For instance, the file that I'm working with is ~20mb of trees.  When I
run my program on this, it is unable to reclaim space (unless i set the
heap really high).  However, if I simply extract the first, say, 4 trees
and place these in a new file, the program runs to completion (albeit
grindingly slowly).  In my experience with other languages, this kind of
slowness on file/io is usually due to trying to read on character at a
time instead of a block...I can't figure out how to get this to behave
properly though.

I can post some of the code if people would like to read it, but
basically it basically looks like this:

convert inF outF = do inH <- openFile inF ReadMode
                      ulf <- hGetContents inH
                      outH <- openFile outF WriteMode
                      parseAll outH ulf
                      hClose inH
                      hClose outH

parseAll outH ulf =
    case parse s of
        Good (tree, rest) -> case convert tree of
                                 Good s'   -> do hPutStrLn outFile s'
                                 Error err -> do putStrLn err
        Error err         -> do return ()

where parse takes a string and parses it and convert takes the tree and
converts it back to a string (in the other format)...

PLEASE help!


Hal Daume III

 "Computer science is no more about computers    | hdaume@isi.edu
  than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume