[Haskell-cafe] Re: GHC predictability

Wed May 14 20:32:44 EDT 2008

On Mon, 2008-05-12 at 19:30 -0700, Don Stewart wrote:
> > I offer up the following example:
> >
> >  mean xs = sum xs / length xs
> >
> > Now try, say, "mean [1.. 1e9]", and watch GHC eat several GB of RAM. (!!)
> 
> But you know why, don't you?
> 
> >  sat down and spent the best part of a day writing an MD5 
> > implementation. Eventually I got it so that all the test vectors work 
> > right. (Stupid little-endian nonsense... mutter mutter...) When I tried 
> > it on a file containing more than 1 MB of data... ooooohhhh dear...
> 
> Did you use Data.Binary or the other libraries for efficient access to gigabytes of data?
> 
> > Of course, the first step in any serious attempt at performance improvement
> > is to actually profile the code to figure out where the time 
> > is being spent. 
> 
> Well, actually, for our 'mean()' case, it means just using the right structures.
> Arrays for example:
> 
>     import Data.Array.Vector
>     import Text.Printf
> 
>     mean :: UArr Double -> Double
>     mean arr = b / fromIntegral a
>       where
>         k (n :*: s) a = n+1 :*: s+a
>         a :*: b = foldlU k (0 :*: 0) arr :: (Int :*: Double)
> 
>     main = printf "%f\n" . mean $ enumFromToFracU 1 1e9
> 
> For example,
> 
>     $ time ./A
>     5.00000000067109e8
>     ./A  3.53s user 0.00s system 99% cpu 3.532 total
> 
> Try allocating an array of doubles in C, and getting similar results.
> (The compiler is optimising this to:
> 
>     Main_zdszdwfold_info:
>       leaq        32(%r12), %rax
>       cmpq        %r15, %rax
>       movq        %rax, %r12
>       ja  .L10
>       movsd       .LC0(%rip), %xmm0
>       ucomisd     %xmm5, %xmm0
>       jae .L12
>       movq        %rsi, (%rax)
>       movq        $base_GHCziFloat_Dzh_con_info, -24(%rax)
>       movsd       %xmm6, -16(%rax)
>       movq        $base_GHCziBase_Izh_con_info, -8(%rax)
>       leaq        -7(%rax), %rbx
>       leaq        -23(%rax), %rsi
>       jmp *(%rbp)
>     .L12:
>       movapd      %xmm6, %xmm0
>       addq        $1, %rsi
>       subq        $32, %r12
>       addsd       %xmm5, %xmm0
>       addsd       .LC2(%rip), %xmm5
>       movapd      %xmm0, %xmm6
>       jmp Main_zdszdwfold_info
> 
> Note even any garbage collection. This should be pretty much the same
> performance-wise as unoptimised C.
> 
> > almost any nontrivial program you write 
> > spends 60% or more of its time doing GC rather than actual work. 
> 
> Ok, you're doing something very wrong. GC time is typically less than 15% of the running
> time of typical work programs I hack on.
> 
> I bet you're using lists inappropriately?
> 
> > I find it completely impossibly to write code that doesn't crawl along at a
> > snail's pace.  Even when I manage to make it faster, I usually have no clue
> > why.
> 
> I think there is a problem that few people are taking the time to understand
> the compilation model of Haskell, while they've had the C runtime behaviour
> drilled into their brains since college.
> 
> Until you sit down and understand what your Haskell code means, it will be very
> hard to reason about optimisations, unfortunately.
> 
> GHC does what it does well, and its predictable. As long as you understand 
> the operations your trying to make predictions about.
> 
> I suggest installing ghc-core, and looking at how your code is compiled.
> Try many examples, and have a good knowledge of the STG paper.

My experience has been that simply understanding lazy/normal order
evaluation is enough to catch most of the "big performance problems"
beginners complain about.  If you can't evaluate Haskell by hand at the
source level, you have no hope of understanding performance issues.

For going more for C-like speed, your advice seems more appropriate.