Too Strict?
C.Reinke
C.Reinke@ukc.ac.uk
Mon, 15 Jan 2001 14:56:46 +0000
> Can someone help? The program below works fine with small files but when I
> try to use it on the one I need to (about 3 million lines of data) it
> produces no output. The hard disk is hammered - I assume this is the run time
> system paging. My suspicion is that the program is trying to read in the
> whole file before processing it. Is this correct? If so, how do I make the
> program lazy so that it processes a line at time?
I was about to apply GHood to your program to see whether such a tool could
help to find the problem, so I started to cut down your code. However,
in the simplified version of the program, one can see what is going on
without any graphical tool.. (nevertheless, observation of the simplified
code with GHood confirms your suspicion immediately, and it points out
the spine of the list as the problem, too, so the tool is useful!-) (*)
In effect, your anonymize comes down to a "mapM" over a list of actions
applied to input lines, and all of the resulting IO-actions are placed
before the single "hPutStr". So, even if the results of the individual
actions in the list may not be needed until later, the whole spine of
the list of lines has to be traversed before "hPutStr" can be executed,
meaning that all input is read before any output is produced (and thus
before any computation results are requested, blowing up memory usage).
For the problem at hand, you could simply output each line as it is
processed instead of just returning it into a list for later use (see
variant _1 below). If you would want to keep both the modular program
structure and the explicit line-by-line IO-style, you would need to
interleave the input and output commands somehow (perhaps similar to
variant _2 below?).
Hth,
Claus
(*) Please note that our web-server is being upgraded today..
(web-pages and GHood download will not be available until
tomorrow, hence no URL here :-(
PS The simplified code (+ variations) with observations:
module Main(main) where
import Observe
import IO(openFile,
hPutStr,
IOMode(ReadMode,WriteMode,AppendMode))
filename = "ldif1.txt"
fileout = "ldif.out"
readAndWriteAttrVals =
do h <- openFile fileout WriteMode
s <- readFile filename
let action l = return (':':l)
a <- mapM action (observe "input" (lines s))
hPutStr h (unlines (observe "output" a))
main = runO readAndWriteAttrVals
readAndWriteAttrVals_1 =
do h <- openFile fileout WriteMode
s <- readFile filename
let action_and_output l = hPutStr h (':':l)
mapM_ (observe "output" action_and_output) (observe "input" (lines s))
main_1 = runO readAndWriteAttrVals_1
readAndWriteAttrVals_2 =
do h <- openFile fileout WriteMode
s <- readFile filename
let { action l = return (':':l)
; as = map action (observe "input" (lines s))
; os = repeat (hPutStr h)
}
mapM id (observe "output" (zipWith (>>=) as os))
main_2 = runO readAndWriteAttrVals_2