[Haskell-beginners] Re: Memory usage problem
Heinrich Apfelmus
apfelmus at quantentunnel.de
Thu Mar 25 06:26:35 EDT 2010
Sami Liedes wrote:
> On Wed, Mar 24, 2010 at 11:23:05AM +0100, Heinrich Apfelmus wrote:
>> A quick fix would be to intersperse a function between sort and
>> readRecords that imposes proper evaluation order:
> [...]
>> While I expect the above to work, I hesitate to claim that it actually
>> does. The reason is that I don't understand your readRecords function
>> at a glance, unlike the processFile pipeline, it is not apparent how
>> it works. Since finding and fixing space leaks is easier if your code is
>> more obvious, I recommend to reformulate it as a function composition as
>> well, for instance something like this:
> [...]
>
> Thanks, that was enlightening! With your version of readRecords the
> program takes as much memory as with mine (it's still more than 10x
> what the C version takes, but perhaps that's the cost of using
> Haskell). But your version taught me a lot. Exactly the kind of
> feedback I needed :)
My pleasure. :)
I should point out that my version of readRecords still needs to be
composed with the strictify function, for the same reasons as yours.
But it shouldn't be very difficult to bake the evaluation order into
readRecords now if desired.
Also note that my readRecord has introduced another potential space
leak. But thanks to the clarified structure, it is easy to spot. Namely,
the function
prefix "Name: " &&& prefix "Version: "
is problematic. Expanding the (&&&) combinator, this is equal to
\xs -> (prefix "Name: " xs, prefix "Version: " xs)
which means that the xs is used twice and thus potentially leaky.
After all, the first component might evaluate xs in full, which will
then leak around until the second component consumes it. This is exactly
what will happen if you feed it a patological input file consisting of
just a single but huge record along the lines of
Version: xyz
Name: haskell
Foo: bar
Foo: bar
...
...
Foo: Bar
If this turns out to be a major problem, you can simply rewrite the
offending function to evaluate the components "in parallel". The
partition function in the Haskell Prelude demonstrates how to do this.
Regards,
Heinrich Apfelmus
--
http://apfelmus.nfshost.com
More information about the Beginners
mailing list