[Haskell-cafe] speed: ghc vs gcc

Fri Feb 20 12:10:19 EST 2009

Hello Don,

Friday, February 20, 2009, 7:41:33 PM, you wrote:

>> main = print $ sum[1..10^9::Int]

> This won't be comparable to your loop below, as 'sum' is a left fold
> (which doesn't fuse under build/foldr).

> You should use the list implementation from the stream-fusion package (or
> uvector) if you're expecting it to fuse to the following loop:

it was comparison of native haskell, low-level haskell (which is
harder to write than native C) and native C. stream-fusion and any
other packages provides libraries for some tasks but they can't make faster
maps, for example. so i used plain list

> Which seems ... OK.

really? :D

> Well, that's a bit different. It doesn't print the result, and it returns a different
> results on 64 bit....

doesn't matter for testing speed

> I don't get anything near the 0.062s which is interesting.

it was beautiful gcc optimization - it added 8 values at once. with
xor results are:

xor.hs      12.605
xor-fast.hs  1.856
xor.cpp      0.339

> The print statement slows things down, I guess...

are you really believe that printing one number needs so much time? :)

> So we have:

>     ghc -fvia-C -O2             1.127
>     ghc -fasm                   1.677
>     gcc -O0                     4.500
>     gcc -O3 -funroll-loops      0.318

why not compare to ghc -O0? also you can disable loop unrolling in gcc
and unroll loops manually in haskell. or you can generate asm code on
the fly. there are plenty of tricks to "prove" that gcc generates bad
code :D

> So. some lessons. GHC is around 3-4x slower on this tight loop. (Which isn't as
> bad as it used to be).

really? what i see: low-level haskell code is usually 3 times harder
to write and 3 times slower than gcc code. native haskell code is tens
to thousands times slower than C code (just recall that real programs
use type classes and monads in addition to laziness)

> That's actually a worse margin than any current shootout program, where we are no
> worse than 2.9 slower on larger things:

1) most benchmarks there depend on libraries speed. in one test, for
example, php is winner
2) for the sum program ghc libs was modified to win in benchmark
3) the remaining 1 or 2 programs that measure speed of ghc-generated
code was hardly optimized using low-level code, so they don't have
anything common with real haskell code most of us write every day

> Now, given GHC gets most of the way there -- I think this might make a good bug
> report against GHC head, so we can see if the new register allocator helps any.

you mean that 6.11 includes new allocator? in that case you can
test it too

i believe that ghc developers are able to test sum performance without my
bugreports :D

-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com