[Haskell-beginners] Re: Excess mem consumption in file IO task

Ertugrul Soeylemez es at ertes.de
Wed Jan 7 03:54:00 EST 2009


"Moritz Tacke" <take at informatik.uni-freiburg.de> wrote:

> I have some resource problems when extracting data from a file. The
> task is as follows: I have a huge (500MB) binary file, containing some
> interesting parts and lots of rubbish. Furthermore, there is a
> directory that tells me the parts of the file (first- and last byte
> index) that contain the substrings I need. My approach to do this is
> to open the file and to pass the list of addresses along with the
> handle to a function that processes the list step-by-step and calls a
> subfunction which uses the handle to seek the start position of the
> interesting block, reads the block into a bytestring (lazy or not,
> didn't make any difference here) and calls the function that scans
> this byte string for the interesting part. Using this approach - which
> results in a data structure with an approximate size of 10 MB - the
> program uses hundreds of megabytes of RAM, which forces my computer to
> swap (with the obvious results...).

You may want to post the relevant parts of your source code on
hpaste.org for reference.


> I have right now two main suspects: The recursive function is
> tail-recursive, but I don't know whether the usual way to write these
> functions (with an accumulator etc) works in monadic code (the stage
> is, of course, the IO monad, and I am using the do-notation as I don't
> like the only other way I know, writing lambdas and lambdas and
> lambdas into the function body). The other problem I can imagine is
> the passing-around of the file handle, and the subsequent reading of
> byte strings: Are those strings somehow attached to the handle, and
> does the handle work in a different way than I expected, i.e. is the
> handle copied while using it as an argument for another function, and
> exists something like a register of handles that keeps the connection
> upright and, therefore, excludes the (handle, string)-chunk from
> garbage collection?

Usually no, unless you read the file with a lazy read function like
hGetContents.  And the normal notation and the do-notation are
equivalent.  When compiling, the do-notation is simply translated to the
normal notation.


> I have, of course, been experimenting with the "seq" - function, but,
> honestly, I am not sure whether I got it right. Does a call to
> "identity $! (function arguments ...)" force the full evaluation of
> the function?

No,

  a `seq` b

says that before evaluating 'b', 'a' should be evaluated.  The function
itself may treat its arguments lazily, which makes a difference, when
it's recursive.


Greets,
Ertugrul.


-- 
nightmare = unsafePerformIO (getWrongWife >>= sex)
http://blog.ertes.de/




More information about the Beginners mailing list