[Haskell-cafe] Re: Performance Tuning & darcs (a real shootout?)

Tue Jan 24 11:09:35 EST 2006

On Jan 24, 2006, at 1:55 AM, Simon Marlow wrote:

>
> You can get a quick picture of heap usage with +RTS -Sstderr, by  
> the way.  To find out what's actually in that heap, you'll need  
> heap profiling (as you know).
[snip]
> Yes, GHC's heap is mmap()'d anonymously.  You really need to find  
> out whether the space leak is mmap()'d by GHC's runtime, or by  
> darcs itself - +RTS -Sstderr or profiling will tell you about GHC's  
> memory usage.

Ah, I had been using little s, but I forgot about the existence of  
big S.  I'll try to include some profiles and the knowledge gained by  
using it.  I wish I could work on that right now but chances are it  
will be Monday or Tuesday before I get to look at it again.

>
> I'd start by using heap profiling to track down what the space leak  
> consists of, and hopefully to give you enough information to  
> diagnose it.  Let's see some heap profiles!

Yes!

>
> Presumably the space leak is just as visible with smaller patches,  
> so you don't need the full 300M patch to investigate it.

This is true, I've had problems with even 30mb patches.  I guess I  
liked using the 300mb patch because it emphasized and exaggerated the  
performance and often I left the profile running on one machine while  
I went off studying the code on another.  But, it's a good suggestion  
for when I want to be able to iterate or get my results sooner.

>
> I don't usually resort to -ddump-simpl until I'm optimising the  
> inner loop, use profiling to find out where the inner loops  
> actually *are* first.

Point taken.

>> Are there tools or techniques that can help me understand why the  
>> memory consumption peaks when applying a patch?  Is it foolish to  
>> think that lazy evaluation is the right approach?
>
> Since you asked, I've never been that keen on mixing laziness and I/ 
> O. Your experiences have strengthened that conviction - if you want  
> strict control over resource usage, laziness is always going to be  
> problematic.  Sure it's great if you can get it right, the code is  
> shorter and runs in small constant space.  But can you guarantee  
> that it'll still have the same memory behaviour with the next  
> version of the compiler?  With a different compiler?

And I've heard others say that laziness adds enough unpredictability  
that it makes optimizing just that much trickier.  I guess this may  
be one of the cases where the "trickiness" outweighs the elegance.

>
>> I'm looking for advice or help in optimizing darcs in this case.   
>> I guess this could be viewed as a challenge for people that felt  
>> like the micro benchmarks of the shootout were unfair to Haskell.   
>> Can we demonstrate that Haskell provides good performance in the  
>> real-world when working with large files?  Ideally, darcs could  
>> easily work with a patch that is 10GB in size using only a few  
>> megs of ram if need be and doing so in about the time it takes  
>> read the file once or twice and gzip it.
>
> I'd love to help you look into it, but I don't really have the  
> time. I'm happy to help out with advice where possible, though.

Several people have spoken up and said, "I'd help but I'm busy"  
including droundy himself.  This is fine, when I said "help" I was  
thinking of advice like you gave.  It was a poor choice of phrasing  
on my part.  I can work ghc and stare at lines of code, but sometimes  
I need guidance since I'm mostly out of my league in this case.

Thanks,
Jason