GHC vs. GCC on raw vector addition
Simon Marlow
simonmarhaskell at gmail.com
Thu Jan 19 06:28:03 EST 2006
John Meacham wrote:
> On Wed, Jan 18, 2006 at 08:54:43PM +0300, Bulat Ziganshin wrote:
>> sorry, with the "gcc -O3 -ffast-math -fstrict-aliasing -funroll-loops"
>> the C version is 50 times faster than best Haskell one... it's the
>> loop from C version:
>
> I believe something similar to what I noted here is the culprit:
> http://www.haskell.org//pipermail/glasgow-haskell-users/2005-October/009174.html
>
> it is fixable, but not without modifying ghc.
Ah, I see what you mean by indirect jumps. Those indirect jumps go away
if you compile with -optc-O2 or -fasm, they're droppings left by
inadequacies in gcc's standard -O optimisation.
Actually, -fasm does better by one instruction than gcc on this example:
.globl Test_zdwfac_info
Test_zdwfac_info:
movq (%rbp),%rax
cmpq $1,%rax
jne .LcmO
movq 8(%rbp),%r13
addq $16,%rbp
jmp *(%rbp)
.LcmO:
leaq -1(%rax),%rcx
imulq 8(%rbp),%rax
movq %rax,8(%rbp)
movq %rcx,(%rbp)
jmp Test_zdwfac_info
vs. gcc -O2:
Test_zdwfac_info:
.text
.align 8
movq (%rbp), %rdx
cmpq $1, %rdx
je .L6
.L3:
movq 8(%rbp), %rax
imulq %rdx, %rax
decq %rdx
movq %rdx, (%rbp)
movq %rax, 8(%rbp)
jmp Test_zdwfac_info
.p2align 4,,7
.L6:
movq 8(%rbp), %r13
addq $16, %rbp
jmp *(%rbp)
We should probably reverse the sense of that branch, like gcc does. The
memory accesses are still there, of course. Hopefully someday I'll get
around to trying to use more registers on x86_64 again.
Cheers,
Simon
More information about the Glasgow-haskell-users
mailing list