[Haskell-cafe] speed: ghc vs gcc
daniel.is.fischer at web.de
Fri Feb 20 15:56:44 EST 2009
Am Freitag, 20. Februar 2009 18:10 schrieb Bulat Ziganshin:
> Hello Don,
> Friday, February 20, 2009, 7:41:33 PM, you wrote:
> >> main = print $ sum[1..10^9::Int]
> > This won't be comparable to your loop below, as 'sum' is a left fold
> > (which doesn't fuse under build/foldr).
> > You should use the list implementation from the stream-fusion package (or
> > uvector) if you're expecting it to fuse to the following loop:
> it was comparison of native haskell, low-level haskell (which is
> harder to write than native C)
Um, not always, and certainly not in cases like this, at least if you call the
simple worker loop "low-level Haskell".
> and native C. stream-fusion and any
> other packages provides libraries for some tasks but they can't make faster
> maps, for example. so i used plain list
Which is of course comparable with a tight loop in C, isn't it?
Really, you hurt your cause by including that.
You said you wanted to compare ghc to gcc, then compare what they do to
> > Which seems ... OK.
> really? :D
> > Well, that's a bit different. It doesn't print the result, and it returns
> > a different results on 64 bit....
> doesn't matter for testing speed
> > I don't get anything near the 0.062s which is interesting.
> it was beautiful gcc optimization - it added 8 values at once. with
> xor results are:
> xor.hs 12.605
> xor-fast.hs 1.856
> xor.cpp 0.339
> > The print statement slows things down, I guess...
> are you really believe that printing one number needs so much time? :)
> > So we have:
> > ghc -fvia-C -O2 1.127
> > ghc -fasm 1.677
> > gcc -O0 4.500
> > gcc -O3 -funroll-loops 0.318
> why not compare to ghc -O0? also you can disable loop unrolling in gcc
> and unroll loops manually in haskell. or you can generate asm code on
> the fly. there are plenty of tricks to "prove" that gcc generates bad
> code :D
That's not what he's doing at all. Sure, he's not comparing Haskell code
compiled without optimisations, but he also includes gcc with highest
optimisation level. Read the gcc -O0 figure as an indication of what
optimisations can achieve.
> > So. some lessons. GHC is around 3-4x slower on this tight loop. (Which
> > isn't as bad as it used to be).
> really? what i see: low-level haskell code is usually 3 times harder
> to write and 3 times slower than gcc code.
I deny that low-level Haskell code is three times harder to write than
ordinary C code, at least if we consider the worker/wrapper idiom low-level
It is also my experience that gcc usually creates faster executables from good
C code than ghc does from corresponding ordinary Haskell code (not using
#-magic), but the margin does vary wildly.
> native haskell code is tens
> to thousands times slower than C code (just recall that real programs
> use type classes and monads in addition to laziness)
Okay, tens is realistic, but thousands?
Of course if you compare a tight loop that doesn't allocate to creating
thousands of millions of cons-cells...
Just because lists are easier to use in Haskell than in any other language I
know doesn't mean it's necessary to use lists when writing Haskell if other
ways are more fitting for the goal.
Just for the record, timings on my machine, gcc-3.3 vs. ghc-6.8.3:
Sums in C, first counting up, then down
Sums in Haskell
First compiled with -O2, then with -O2 -fvia-C -optc-O3
Loop down with BangPatterns
Loop down without BangPatterns
Loop up (with BangPatterns)
Using strict left fold
More information about the Haskell-Cafe