Removing/deprecating -fvia-c
Don Stewart
dons at galois.com
Mon Feb 15 13:29:20 EST 2010
marlowsd:
>>>
>>> Simon Marlow has recently fixed FP performance for modern x86 chips in
>>> the native code generator in the HEAD. That was the last reason we know
>>> of to prefer via-C to the native code generators. But before we start
>>> the removal process, does anyone know of any other problems with the
>>> native code generators that need to be fixed first?
>>>
>>
>> Do we have the blessing of the DPH team, wrt. tight, numeric inner loops?
>>
>> As recently as last year -fvia-C -optc-O3 was still useful for some
>> microbenchmarks -- what's changed in that time, or is expected to change?
>
> If you have benchmarks that show a significant difference, I'd be
> interested to see them.
I've attached an example where there's a 40% variation (and it's a
floating point benchmark). Roman would be seeing similar examples in the
vector code.
I'm all in favor of dropping the C backend, but I'm also wary that we
don't have benchmarks to know what difference it is making.
Here's a simple program testing a tight, floating point loop:
import Data.Array.Vector
import Data.Complex
main = print . sumU $ replicateU (1000000000 :: Int) (1 :+ 1 ::Complex Double)
Compiled with ghc 6.12, uvector-0.1.1.0 on a 64 bit linux box.
The -fvia-C -optc-O3 is about 40% faster than -fasm.
How does it fair with the new sse patches?
I've attached the assembly below for each case..
-- Don
------------------------------------------------------------------------
Fastest. 2.17s. About 40% faster than -fasm
$ time ./sum-complex
1.0e9 :+ 1.0e9
./sum-complex 2.16s user 0.00s system 99% cpu 2.175 total
Main_mainzuzdszdwfold_info:
leaq 32(%r12), %rax
movq %r12, %rdx
cmpq 144(%r13), %rax
movq %rax, %r12
ja .L4
cmpq $1000000000, %r14
je .L9
.L5:
movsd .LC0(%rip), %xmm0
leaq 1(%r14), %r14
addsd %xmm0, %xmm5
addsd %xmm0, %xmm6
movq %rdx, %r12
jmp Main_mainzuzdszdwfold_info
.L4:
leaq -24(%rbp), %rax
movq $32, 184(%r13)
movq %rax, %rbp
movq %r14, (%rax)
movsd %xmm5, 8(%rax)
movsd %xmm6, 16(%rax)
movl $Main_mainzuzdszdwfold_closure, %ebx
jmp *-8(%r13)
.L9:
movq $ghczmprim_GHCziTypes_Dzh_con_info, -24(%rax)
movsd %xmm5, -16(%rax)
movq $ghczmprim_GHCziTypes_Dzh_con_info, -8(%rax)
leaq 25(%rdx), %rbx
movsd %xmm6, 32(%rdx)
leaq 9(%rdx), %r14
jmp *(%rbp)
------------------------------------------------------------------------
Second, 2.34s
$ ghc-core sum-complex.hs -O2 -fvia-C -optc-O3
$ time ./sum-complex
1.0e9 :+ 1.0e9
./sum-complex 2.33s user 0.01s system 99% cpu 2.347 total
Main_mainzuzdszdwfold_info:
leaq 32(%r12), %rax
cmpq 144(%r13), %rax
movq %r12, %rdx
movq %rax, %r12
ja .L4
cmpq $100000000, %r14
je .L9
.L5:
movsd .LC0(%rip), %xmm0
leaq 1(%r14), %r14
movq %rdx, %r12
addsd %xmm0, %xmm5
addsd %xmm0, %xmm6
jmp Main_mainzuzdszdwfold_info
.L4:
leaq -24(%rbp), %rax
movq $32, 184(%r13)
movl $Main_mainzuzdszdwfold_closure, %ebx
movsd %xmm5, 8(%rax)
movq %rax, %rbp
movq %r14, (%rax)
movsd %xmm6, 16(%rax)
jmp *-8(%r13)
.L9:
movq $ghczmprim_GHCziTypes_Dzh_con_info, -24(%rax)
movsd %xmm5, -16(%rax)
movq $ghczmprim_GHCziTypes_Dzh_con_info, -8(%rax)
leaq 25(%rdx), %rbx
movsd %xmm6, 32(%rdx)
leaq 9(%rdx), %r14
jmp *(%rbp)
------------------------------------------------------------------------
Native codegen, 3.57s
ghc 6.12 -fasm -O2
$ time ./sum-complex
1.0e9 :+ 1.0e9
./sum-complex 3.57s user 0.01s system 99% cpu 3.574 total
Main_mainzuzdszdwfold_info:
.Lc1i7:
addq $32,%r12
cmpq 144(%r13),%r12
ja .Lc1ia
movq %r14,%rax
cmpq $100000000,%rax
jne .Lc1id
movq $ghczmprim_GHCziTypes_Dzh_con_info,-24(%r12)
movsd %xmm5,-16(%r12)
movq $ghczmprim_GHCziTypes_Dzh_con_info,-8(%r12)
movsd %xmm6,(%r12)
leaq -7(%r12),%rbx
leaq -23(%r12),%r14
jmp *(%rbp)
.Lc1ia:
movq $32,184(%r13)
movl $Main_mainzuzdszdwfold_closure,%ebx
addq $-24,%rbp
movq %r14,(%rbp)
movsd %xmm5,8(%rbp)
movsd %xmm6,16(%rbp)
jmp *-8(%r13)
.Lc1id:
movsd %xmm6,%xmm0
addsd .Ln1if(%rip),%xmm0
movsd %xmm5,%xmm7
addsd .Ln1ig(%rip),%xmm7
leaq 1(%rax),%r14
movsd %xmm7,%xmm5
movsd %xmm0,%xmm6
addq $-32,%r12
jmp Main_mainzuzdszdwfold_info
More information about the Glasgow-haskell-users
mailing list