[Haskell-cafe] Re: Performance Tuning & darcs (a real shootout?)
Jason Dagit
dagit at eecs.oregonstate.edu
Tue Jan 24 11:09:35 EST 2006
On Jan 24, 2006, at 1:55 AM, Simon Marlow wrote:
>
> You can get a quick picture of heap usage with +RTS -Sstderr, by
> the way. To find out what's actually in that heap, you'll need
> heap profiling (as you know).
[snip]
> Yes, GHC's heap is mmap()'d anonymously. You really need to find
> out whether the space leak is mmap()'d by GHC's runtime, or by
> darcs itself - +RTS -Sstderr or profiling will tell you about GHC's
> memory usage.
Ah, I had been using little s, but I forgot about the existence of
big S. I'll try to include some profiles and the knowledge gained by
using it. I wish I could work on that right now but chances are it
will be Monday or Tuesday before I get to look at it again.
>
> I'd start by using heap profiling to track down what the space leak
> consists of, and hopefully to give you enough information to
> diagnose it. Let's see some heap profiles!
Yes!
>
> Presumably the space leak is just as visible with smaller patches,
> so you don't need the full 300M patch to investigate it.
This is true, I've had problems with even 30mb patches. I guess I
liked using the 300mb patch because it emphasized and exaggerated the
performance and often I left the profile running on one machine while
I went off studying the code on another. But, it's a good suggestion
for when I want to be able to iterate or get my results sooner.
>
> I don't usually resort to -ddump-simpl until I'm optimising the
> inner loop, use profiling to find out where the inner loops
> actually *are* first.
Point taken.
>> Are there tools or techniques that can help me understand why the
>> memory consumption peaks when applying a patch? Is it foolish to
>> think that lazy evaluation is the right approach?
>
> Since you asked, I've never been that keen on mixing laziness and I/
> O. Your experiences have strengthened that conviction - if you want
> strict control over resource usage, laziness is always going to be
> problematic. Sure it's great if you can get it right, the code is
> shorter and runs in small constant space. But can you guarantee
> that it'll still have the same memory behaviour with the next
> version of the compiler? With a different compiler?
And I've heard others say that laziness adds enough unpredictability
that it makes optimizing just that much trickier. I guess this may
be one of the cases where the "trickiness" outweighs the elegance.
>
>> I'm looking for advice or help in optimizing darcs in this case.
>> I guess this could be viewed as a challenge for people that felt
>> like the micro benchmarks of the shootout were unfair to Haskell.
>> Can we demonstrate that Haskell provides good performance in the
>> real-world when working with large files? Ideally, darcs could
>> easily work with a patch that is 10GB in size using only a few
>> megs of ram if need be and doing so in about the time it takes
>> read the file once or twice and gzip it.
>
> I'd love to help you look into it, but I don't really have the
> time. I'm happy to help out with advice where possible, though.
Several people have spoken up and said, "I'd help but I'm busy"
including droundy himself. This is fine, when I said "help" I was
thinking of advice like you gave. It was a poor choice of phrasing
on my part. I can work ghc and stare at lines of code, but sometimes
I need guidance since I'm mostly out of my league in this case.
Thanks,
Jason
More information about the Haskell-Cafe
mailing list