[Haskell-cafe] Re: speed: ghc vs gcc
Don Stewart
dons at galois.com
Fri Feb 20 15:05:19 EST 2009
barsoap:
> Don Stewart <dons at galois.com> wrote:
>
> > No! This is not how open source works! You *should submit bug
> > reports* and *analysis*. It is so so much more useful than
> > complaining and throwing stones.
> >
> Exactly. I don't know where, but I read that the vast majorities of
> Linux bugs are reported, nailed, and then fixed, by at least three
> different persons: The first reports a misbehaviour, the second manages
> to find it surfacing in a certain line of code, the third instantly
> knows how to make it go away.
Elaboarting further:
Thinking more about Bulat's code gen observations, I think there's something
wrong here -- other than that GHC needs the new codegen to do any of the
fancier loop optimisations.
If we take what I usually see as the best loops GHC can do for this kind of thing:
import Data.Array.Vector
main = print (sumU (enumFromToU 1 (10^9 :: Int)))
And compile it:
$ ghc-core A.hs -O2 -fvia-C -optc-O3
We get ideal core, all data structures fused away, and no heap allocation:
$wfold_s15t :: Int# -> Int# -> Int#
$wfold_s15t =
\ (ww1_s150 :: Int#) (ww2_s154 :: Int#) ->
case ># ww2_s154 ww_s14U of wild_aWm {
False ->
$wfold_s15t
(+# ww1_s150 ww2_s154) (+# ww2_s154 1);
True -> ww1_s150
}; } in
case $wfold_s15t 0 1
Which produces nice assembly:
s16e_info:
cmpq 6(%rbx), %rdi
jg .L2
addq %rdi, %rsi
leaq 1(%rdi), %rdi
jmp s16e_info
This is the best GHC will do here, in my experience, and I'm satisifed with it.
Short of new backend tweaks, and realising that GHC is not the loop magic compiler GCC is.
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/IntegratedCodeGen
We can be happy with this. The compiler is doing exactly what we expect.
$ time ./B
500000000500000000
./B 0.96s user 0.00s system 99% cpu 0.967 total
Now, going back to the low level version, Bulat's loop:
main()
{
int sum=0;
//for(int j=0; j<100;j++)
for(int i=0; i<1000*1000*1000;i++)
sum += i;
return sum;
}
What was first confusing for me was that he wrote the loop "backwards" when translating to Haskell,
like this:
main = print $ sum0 (10^9) 0
sum0 :: Int -> Int -> Int
sum0 0 !acc = acc
sum0 !x !acc = sum0 (x-1) (acc+x)
(The bang patterns aren't needed). Note how he counts backwards from 10^9. Was there a reason for that, Bulat?
I wondered if we just got worse code on backwards counting loops. So
translating into the "obvious" translation, counting up:
main = print (sum0 0 1)
sum0 :: Int -> Int -> Int
sum0 acc n | n > 10^9 = acc
| otherwise = sum0 (acc + n) (n + 1)
Which I actually consider to be the same difficulty as writing the C version, fwiw...
We start to notice something interesting:
$wsum0 :: Int# -> Int# -> Int#
$wsum0 =
\ (ww_sOH :: Int#) (ww1_sOL :: Int#) ->
case lvl2 of wild1_aHn { I# y_aHp ->
case ># ww1_sOL y_aHp of wild_B1 {
False ->
letrec {
$wsum01_XPd :: Int# -> Int# -> Int#
$wsum01_XPd =
\ (ww2_XP4 :: Int#) (ww3_XP9 :: Int#) ->
case ># ww3_XP9 y_aHp of wild11_Xs {
False ->
$wsum01_XPd (+# ww2_XP4 ww3_XP9) (+# ww3_XP9 1);
True -> ww2_XP4
}; } in
$wsum01_XPd (+# ww_sOH ww1_sOL) (+# ww1_sOL 1);
True -> ww_sOH
}
Why is there an extra test? What is GHC doing?
Checking the asm:
$ ghc -O2 -fasm
sQ3_info:
.LcRt:
cmpq 8(%rbp),%rsi
jg .LcRw
leaq 1(%rsi),%rax
addq %rsi,%rbx
movq %rax,%rsi
jmp sQ3_info
$ time ./B
500000000500000000
./B 1.30s user 0.01s system 98% cpu 1.328 total
So its a fair bit slower. Now, we should, as a principle, be able to write sum directly as I did , and get the
same code from the manual, and automatically , fused version. But we didn't.
Checking via C:
$ ghc -O2 -optc-O3 -fvia-C
Better code, but still a bit slower:
sQ3_info:
cmpq 8(%rbp), %rsi
jg .L8
addq %rsi, %rbx
leaq 1(%rsi), %rsi
jmp sQ3_info
Running:
$ time ./B
500000000500000000
./B 1.01s user 0.01s system 97% cpu 1.035 total
So I think we have a bug report! Why did GHC put that extra test in place?
Now, none of this addresses (I think) Bulat's point that GCC can unroll loops and do other loop magic.
That's handled under a different workflow - the new code generator.
I'll create the ticket.
-- Don
More information about the Haskell-Cafe
mailing list