[Haskell-beginners] Re: Memory usage problem

Heinrich Apfelmus apfelmus at quantentunnel.de
Thu Mar 25 06:26:35 EDT 2010

Sami Liedes wrote:
> On Wed, Mar 24, 2010 at 11:23:05AM +0100, Heinrich Apfelmus wrote:
>> A quick fix would be to intersperse a function between  sort  and
>> readRecords  that imposes proper evaluation order:
> [...]
>> While I expect the above to work, I hesitate to claim that it actually
>> does. The reason is that I don't understand your  readRecords  function
>> at a glance, unlike the  processFile  pipeline, it is not apparent how
>> it works. Since finding and fixing space leaks is easier if your code is
>> more obvious, I recommend to reformulate it as a function composition as
>> well, for instance something like this:
> [...]
> Thanks, that was enlightening! With your version of readRecords the
> program takes as much memory as with mine (it's still more than 10x
> what the C version takes, but perhaps that's the cost of using
> Haskell). But your version taught me a lot. Exactly the kind of
> feedback I needed :)

My pleasure. :)

I should point out that my version of  readRecords  still needs to be
composed with the  strictify  function, for the same reasons as yours.
But it shouldn't be very difficult to bake the evaluation order into
readRecords  now if desired.

Also note that my  readRecord  has introduced another potential space
leak. But thanks to the clarified structure, it is easy to spot. Namely,
the function

     prefix "Name: " &&& prefix "Version: "

is problematic. Expanding the (&&&) combinator, this is equal to

    \xs -> (prefix "Name: " xs, prefix "Version: " xs)

which means that the  xs  is used twice and thus potentially leaky.
After all, the first component might evaluate  xs  in full, which will
then leak around until the second component consumes it. This is exactly
what will happen if you feed it a patological input file consisting of
just a single but huge record along the lines of

    Version: xyz
    Name: haskell
    Foo: bar
    Foo: bar
    Foo: Bar

If this turns out to be a major problem, you can simply rewrite the
offending function to evaluate the components "in parallel". The
partition  function in the Haskell Prelude demonstrates how to do this.

Heinrich Apfelmus


More information about the Beginners mailing list