[Haskell-cafe] Re: GHC predictability

Wed May 14 20:34:24 EDT 2008

derek.a.elkins:
> On Mon, 2008-05-12 at 19:30 -0700, Don Stewart wrote:
> > > I offer up the following example:
> > >
> > >  mean xs = sum xs / length xs
> > >
> > > Now try, say, "mean [1.. 1e9]", and watch GHC eat several GB of RAM. (!!)
> > 
> > But you know why, don't you?
> > 
> > >  sat down and spent the best part of a day writing an MD5 
> > > implementation. Eventually I got it so that all the test vectors work 
> > > right. (Stupid little-endian nonsense... mutter mutter...) When I tried 
> > > it on a file containing more than 1 MB of data... ooooohhhh dear...
> > 
> > Did you use Data.Binary or the other libraries for efficient access to gigabytes of data?
> > 
> > > Of course, the first step in any serious attempt at performance improvement
> > > is to actually profile the code to figure out where the time 
> > > is being spent. 
> > 
> > Well, actually, for our 'mean()' case, it means just using the right structures.
> > Arrays for example:
> > 
> >     import Data.Array.Vector
> >     import Text.Printf
> > 
> >     mean :: UArr Double -> Double
> >     mean arr = b / fromIntegral a
> >       where
> >         k (n :*: s) a = n+1 :*: s+a
> >         a :*: b = foldlU k (0 :*: 0) arr :: (Int :*: Double)
> > 
> >     main = printf "%f\n" . mean $ enumFromToFracU 1 1e9
> > 
> > For example,
> > 
> >     $ time ./A
> >     5.00000000067109e8
> >     ./A  3.53s user 0.00s system 99% cpu 3.532 total
> > 
> > Try allocating an array of doubles in C, and getting similar results.
> > (The compiler is optimising this to:
> > 
> >     Main_zdszdwfold_info:
> >       leaq        32(%r12), %rax
> >       cmpq        %r15, %rax
> >       movq        %rax, %r12
> >       ja  .L10
> >       movsd       .LC0(%rip), %xmm0
> >       ucomisd     %xmm5, %xmm0
> >       jae .L12
> >       movq        %rsi, (%rax)
> >       movq        $base_GHCziFloat_Dzh_con_info, -24(%rax)
> >       movsd       %xmm6, -16(%rax)
> >       movq        $base_GHCziBase_Izh_con_info, -8(%rax)
> >       leaq        -7(%rax), %rbx
> >       leaq        -23(%rax), %rsi
> >       jmp *(%rbp)
> >     .L12:
> >       movapd      %xmm6, %xmm0
> >       addq        $1, %rsi
> >       subq        $32, %r12
> >       addsd       %xmm5, %xmm0
> >       addsd       .LC2(%rip), %xmm5
> >       movapd      %xmm0, %xmm6
> >       jmp Main_zdszdwfold_info
> > 
> > Note even any garbage collection. This should be pretty much the same
> > performance-wise as unoptimised C.
> > 
> > > almost any nontrivial program you write 
> > > spends 60% or more of its time doing GC rather than actual work. 
> > 
> > Ok, you're doing something very wrong. GC time is typically less than 15% of the running
> > time of typical work programs I hack on.
> > 
> > I bet you're using lists inappropriately?
> > 
> > > I find it completely impossibly to write code that doesn't crawl along at a
> > > snail's pace.  Even when I manage to make it faster, I usually have no clue
> > > why.
> > 
> > I think there is a problem that few people are taking the time to understand
> > the compilation model of Haskell, while they've had the C runtime behaviour
> > drilled into their brains since college.
> > 
> > Until you sit down and understand what your Haskell code means, it will be very
> > hard to reason about optimisations, unfortunately.
> > 
> > GHC does what it does well, and its predictable. As long as you understand 
> > the operations your trying to make predictions about.
> > 
> > I suggest installing ghc-core, and looking at how your code is compiled.
> > Try many examples, and have a good knowledge of the STG paper.
> 
> My experience has been that simply understanding lazy/normal order
> evaluation is enough to catch most of the "big performance problems"
> beginners complain about.  If you can't evaluate Haskell by hand at the
> source level, you have no hope of understanding performance issues.
> 
> For going more for C-like speed, your advice seems more appropriate.
> 

Yes, I agree. You must first be able to reason about the evaluation
strategy on paper. If you want to get C like speed, you should
understand Core. Thankfully, the knowledge of how to do this, and tools
to help, are improving.

-- Don