[Haskell-cafe] Re: GHC predictability
dons at galois.com
Wed May 14 20:34:24 EDT 2008
> On Mon, 2008-05-12 at 19:30 -0700, Don Stewart wrote:
> > > I offer up the following example:
> > >
> > > mean xs = sum xs / length xs
> > >
> > > Now try, say, "mean [1.. 1e9]", and watch GHC eat several GB of RAM. (!!)
> > But you know why, don't you?
> > > sat down and spent the best part of a day writing an MD5
> > > implementation. Eventually I got it so that all the test vectors work
> > > right. (Stupid little-endian nonsense... mutter mutter...) When I tried
> > > it on a file containing more than 1 MB of data... ooooohhhh dear...
> > Did you use Data.Binary or the other libraries for efficient access to gigabytes of data?
> > > Of course, the first step in any serious attempt at performance improvement
> > > is to actually profile the code to figure out where the time
> > > is being spent.
> > Well, actually, for our 'mean()' case, it means just using the right structures.
> > Arrays for example:
> > import Data.Array.Vector
> > import Text.Printf
> > mean :: UArr Double -> Double
> > mean arr = b / fromIntegral a
> > where
> > k (n :*: s) a = n+1 :*: s+a
> > a :*: b = foldlU k (0 :*: 0) arr :: (Int :*: Double)
> > main = printf "%f\n" . mean $ enumFromToFracU 1 1e9
> > For example,
> > $ time ./A
> > 5.00000000067109e8
> > ./A 3.53s user 0.00s system 99% cpu 3.532 total
> > Try allocating an array of doubles in C, and getting similar results.
> > (The compiler is optimising this to:
> > Main_zdszdwfold_info:
> > leaq 32(%r12), %rax
> > cmpq %r15, %rax
> > movq %rax, %r12
> > ja .L10
> > movsd .LC0(%rip), %xmm0
> > ucomisd %xmm5, %xmm0
> > jae .L12
> > movq %rsi, (%rax)
> > movq $base_GHCziFloat_Dzh_con_info, -24(%rax)
> > movsd %xmm6, -16(%rax)
> > movq $base_GHCziBase_Izh_con_info, -8(%rax)
> > leaq -7(%rax), %rbx
> > leaq -23(%rax), %rsi
> > jmp *(%rbp)
> > .L12:
> > movapd %xmm6, %xmm0
> > addq $1, %rsi
> > subq $32, %r12
> > addsd %xmm5, %xmm0
> > addsd .LC2(%rip), %xmm5
> > movapd %xmm0, %xmm6
> > jmp Main_zdszdwfold_info
> > Note even any garbage collection. This should be pretty much the same
> > performance-wise as unoptimised C.
> > > almost any nontrivial program you write
> > > spends 60% or more of its time doing GC rather than actual work.
> > Ok, you're doing something very wrong. GC time is typically less than 15% of the running
> > time of typical work programs I hack on.
> > I bet you're using lists inappropriately?
> > > I find it completely impossibly to write code that doesn't crawl along at a
> > > snail's pace. Even when I manage to make it faster, I usually have no clue
> > > why.
> > I think there is a problem that few people are taking the time to understand
> > the compilation model of Haskell, while they've had the C runtime behaviour
> > drilled into their brains since college.
> > Until you sit down and understand what your Haskell code means, it will be very
> > hard to reason about optimisations, unfortunately.
> > GHC does what it does well, and its predictable. As long as you understand
> > the operations your trying to make predictions about.
> > I suggest installing ghc-core, and looking at how your code is compiled.
> > Try many examples, and have a good knowledge of the STG paper.
> My experience has been that simply understanding lazy/normal order
> evaluation is enough to catch most of the "big performance problems"
> beginners complain about. If you can't evaluate Haskell by hand at the
> source level, you have no hope of understanding performance issues.
> For going more for C-like speed, your advice seems more appropriate.
Yes, I agree. You must first be able to reason about the evaluation
strategy on paper. If you want to get C like speed, you should
understand Core. Thankfully, the knowledge of how to do this, and tools
to help, are improving.
More information about the Haskell-Cafe