Too Strict?

C.Reinke C.Reinke@ukc.ac.uk
Mon, 15 Jan 2001 14:56:46 +0000


> Can someone help? The program below works fine with small files but when I
> try to use it on the one I need to (about 3 million lines of data) it
> produces no output. The hard disk is hammered - I assume this is the run time
> system paging. My suspicion is that the program is trying to read in the
> whole file before processing it. Is this correct? If so, how do I make the
> program lazy so that it processes a line at time?

I was about to apply GHood to your program to see whether such a tool could 
help to find the problem, so I started to cut down your code. However, 
in the simplified version of the program, one can see what is going on 
without any graphical tool.. (nevertheless, observation of the simplified
code with GHood confirms your suspicion immediately, and it points out
the spine of the list as the problem, too, so the tool is useful!-) (*)

In effect, your anonymize comes down to a "mapM" over a list of actions
applied to input lines, and all of the resulting IO-actions are placed
before the single "hPutStr". So, even if the results of the individual
actions in the list may not be needed until later, the whole spine of 
the list of lines has to be traversed before "hPutStr" can be executed, 
meaning that all input is read before any output is produced (and thus 
before any computation results are requested, blowing up memory usage).

For the problem at hand, you could simply output each line as it is 
processed instead of just returning it into a list for later use (see
variant _1 below). If you would want to keep both the modular program 
structure and the explicit line-by-line IO-style, you would need to 
interleave the input and output commands somehow (perhaps similar to
variant _2 below?).

Hth,
Claus

(*) Please note that our web-server is being upgraded today..
    (web-pages and GHood download will not be available until
     tomorrow, hence no URL here :-(

PS The simplified code (+ variations) with observations:

module Main(main) where

import Observe
import IO(openFile,
          hPutStr,
          IOMode(ReadMode,WriteMode,AppendMode))

filename = "ldif1.txt"
fileout  = "ldif.out"

readAndWriteAttrVals =
   do h <- openFile fileout WriteMode
      s <- readFile filename
      let action l = return (':':l)
      a <- mapM action (observe "input" (lines s))
      hPutStr h (unlines (observe "output" a)) 

main = runO readAndWriteAttrVals


readAndWriteAttrVals_1 =
   do h <- openFile fileout WriteMode
      s <- readFile filename
      let action_and_output l = hPutStr h (':':l)
      mapM_ (observe "output" action_and_output) (observe "input" (lines s))

main_1 = runO readAndWriteAttrVals_1


readAndWriteAttrVals_2 =
   do h <- openFile fileout WriteMode
      s <- readFile filename
      let { action l = return (':':l)
          ; as = map action (observe "input" (lines s))
          ; os = repeat (hPutStr h) 
          }
      mapM id (observe "output" (zipWith (>>=) as os))

main_2 = runO readAndWriteAttrVals_2