[Haskell-cafe] speed: ghc vs gcc
dons at galois.com
Fri Feb 20 13:39:46 EST 2009
> Friday, February 20, 2009, 7:41:33 PM, you wrote:
> >> main = print $ sum[1..10^9::Int]
> > This won't be comparable to your loop below, as 'sum' is a left fold
> > (which doesn't fuse under build/foldr).
> > You should use the list implementation from the stream-fusion package (or
> > uvector) if you're expecting it to fuse to the following loop:
> it was comparison of native haskell, low-level haskell (which is
> harder to write than native C) and native C. stream-fusion and any
> other packages provides libraries for some tasks but they can't make faster
> maps, for example. so i used plain list
Hmm? Maybe you're not familiar with the state of the art?
$ cabal install uvector
Write a loop at a high level:
main = print (sumU (enumFromToU 1 (10^9 :: Int)))
$ ghc-core A.hs -O2 -fvia-C -optc-O3
cmpq 6(%rbx), %rdi
addq %rdi, %rsi
leaq 1(%rdi), %rdi
$ time ./A
./A 0.97s user 0.01s system 99% cpu 0.982 total
Now, (trying to avoid the baiting...) this is actually *very*
interesting. Why is this faster than the manual recursion we did earlier
why do we get better assembly? Again, if you stick to specifics, there's some
interesting things we can learn here.
> > Which seems ... OK.
> really? :D
No, see above.
> > I don't get anything near the 0.062s which is interesting.
> it was beautiful gcc optimization - it added 8 values at once. with
> xor results are:
> xor.hs 12.605
> xor-fast.hs 1.856
> xor.cpp 0.339
GCC is a good loop optimiser. But apparently not my GCC.
> > So we have:
> > ghc -fvia-C -O2 1.127
> > ghc -fasm 1.677
> > gcc -O0 4.500
> > gcc -O3 -funroll-loops 0.318
> why not compare to ghc -O0? also you can disable loop unrolling in gcc
> and unroll loops manually in haskell. or you can generate asm code on
> the fly. there are plenty of tricks to "prove" that gcc generates bad
> code :D
No, we want to show (I imagine) that GHC is within a factor or two of "C".
I usually set my benchmark to beat gcc -O0 fwiw, and then to hope to be within
2x of optimised C. I'm not sure what you're standards are.
> > So. some lessons. GHC is around 3-4x slower on this tight loop. (Which isn't as
> > bad as it used to be).
> really? what i see: low-level haskell code is usually 3 times harder
> to write and 3 times slower than gcc code. native haskell code is tens
> to thousands times slower than C code (just recall that real programs
> use type classes and monads in addition to laziness)
"thousands times", now you're just undermining your own credibility
here. Stick to what you can measure. If anything we'd expect GCC's magic loop
skillz to be less useful on large code bases.
> > That's actually a worse margin than any current shootout program, where we are no
> > worse than 2.9 slower on larger things:
> 1) most benchmarks there depend on libraries speed. in one test, for
> example, php is winner
> 2) for the sum program ghc libs was modified to win in benchmark
It is interesting that the < 2.9x slower in the shootout is pretty much what
we found in this benchmark too.
> 3) the remaining 1 or 2 programs that measure speed of ghc-generated
> code was hardly optimized using low-level code, so they don't have
> anything common with real haskell code most of us write every day
Depends on where you work.
> > Now, given GHC gets most of the way there -- I think this might make a good bug
> > report against GHC head, so we can see if the new register allocator helps any.
> you mean that 6.11 includes new allocator? in that case you can
> test it too
> i believe that ghc developers are able to test sum performance without my
> bugreports :D
No! This is not how open source works! You *should submit bug reports* and *analysis*.
It is so so much more useful than complaining and throwing stones.
More information about the Haskell-Cafe