hGetContents and laziness in file io

Thomas Hallgren hallgren@cse.ogi.edu
Mon, 23 Jul 2001 23:39:17 -0700


My guess is that there is a space leak in your program. In both function 
convert and parseAll, there are references (the variable ulf) to the 
contents of the input file, and they will probably not be released until 
the functions return (unless you use a compiler that is clever enough to 
delete references after their last use...). There might be sources of 
space leaks also in the function parse that is called from parseAll.

If your program only processes one file each time you run it, you could 
structure it like this:

main = interact parseAll'
    parseAll :: String -> String
    parseAll' = unlines . map convert' . parse'

    parse' :: String -> [Tree]
    convert' :: Tree -> String

    parse' s =
      case parseOneTree s of
        Good (tree,rest) -> tree:parse' rest
        Error err -> error err

    convert' tree = ...

Note that parse' is lazy: it returns the first tree before it tries to 
parse the rest of the input.

Anyway, space leaks can be hard to find and eliminate, but there are 
tools that can help. The Haskell compiler Nhc98 
(http://www.cs.york.ac.uk/fp/nhc98/) tries to generate space efficient 
code to begin with, but also provides heap profiling to help you find 
out what kind of data is occupying all the space (constructor profile), 
which functions produced the data (producer profile) which functions 
have references to the data (retainer profile), ...

Hope this helps!

Thomas Hallgren

Hal Daume wrote:

>... the file that I'm working with is ~20mb of trees.  When I
>run my program on this, it is unable to reclaim space (unless i set the
>heap really high). ...
>convert inF outF = do inH <- openFile inF ReadMode
>                      ulf <- hGetContents inH
>                      outH <- openFile outF WriteMode
>                      parseAll outH ulf
>                      hClose inH
>                      hClose outH
>parseAll outH ulf =
>    case parse s of
>        Good (tree, rest) -> case convert tree of
>                                 Good s'   -> do hPutStrLn outFile s'
>                                 Error err -> do putStrLn err
>        Error err         -> do return ()
>PLEASE help!