-optc-O2 considered useful
Don Stewart
dons at galois.com
Thu May 15 14:47:20 EDT 2008
I discovered something today I didn't know.
gcc -O2 can optimise out the computed jumps GHC produces in tight loops.
Consider this program,
import Data.Array.Vector
import Data.Bits
main = print . sumU
. mapU (*2)
. mapU (`shiftL` 2)
$ replicateU (100000000 :: Int) (5::Int)
Yields this core:
$wfold :: Int# -> Int# -> Int#
$wfold =
\ (ww_sMp :: Int#) (ww1_sMt :: Int#) ->
case ww1_sMt of wild_X10 {
__DEFAULT -> $wfold (+# ww_sMp 40) (+# wild_X10 1);
100000000 -> ww_sMp
And -O2 -fasm:
Main_zdwfold_info:
movq %rdi,%rax
cmpq $100000000,%rax
jne .LcOk
movq %rsi,%rbx
jmp *(%rbp)
.LcOk:
incq %rax
addq $40,%rsi
movq %rax,%rdi
jmp Main_zdwfold_info
$ time ./sum
4000000000
./sum 0.19s user 0.00s system 101% cpu 0.188 total
-O2 -fvia-C -optc-O:
Main_zdwfold_info:
cmpq $100000000, %rdi
jne .L3
movq %rsi, %rbx
movq (%rbp), %rax
.L4:
jmp *%rax
.L3:
addq $40, %rsi
leaq 1(%rdi), %rdi
movl $Main_zdwfold_info, %eax
jmp .L4
$ time ./sum
4000000000
./sum 0.34s user 0.00s system 94% cpu 0.361 total
Hmm. That movl, jmp .L4 ; jmp *%rax looks sucky, and performance got worse.
And now with -O2 -fvia-C -optc-O2
Main_zdwfold_info:
cmpq $100000000, %rdi
je .L5
.L3:
addq $40, %rsi
leaq 1(%rdi), %rdi
jmp Main_zdwfold_info
$ time ./sum
4000000000
./sum 0.11s user 0.02s system 106% cpu 0.122 total
Woot, back in business.
-- Don
More information about the Glasgow-haskell-users
mailing list