Removing/deprecating -fvia-c

Tue Feb 16 12:51:06 EST 2010

marlowsd:
>
> I manged to improve this:
>
> Main_mainzuzdszdwfold_info:
> .Lc1lP:
>         addq $32,%r12
>         cmpq 144(%r13),%r12
>         ja .Lc1lS
>         movq %r14,%rax
>         cmpq $1000000000,%rax
>         jne .Lc1lV
>         movq $ghczmprim_GHCziTypes_Dzh_con_info,-24(%r12)
>         movsd %xmm6,-16(%r12)
>         movq $ghczmprim_GHCziTypes_Dzh_con_info,-8(%r12)
>         movsd %xmm5,(%r12)
>         leaq -7(%r12),%rbx
>         leaq -23(%r12),%r14
>         jmp *(%rbp)
> .Lc1lS:
>         movq $32,184(%r13)
>         movl $Main_mainzuzdszdwfold_closure,%ebx
>         addq $-24,%rbp
>         movsd %xmm5,(%rbp)
>         movsd %xmm6,8(%rbp)
>         movq %r14,16(%rbp)
>         jmp *-8(%r13)
> .Lc1lV:
>         addsd .Ln1m2(%rip),%xmm5
>         addsd .Ln1m3(%rip),%xmm6
>         leaq 1(%rax),%r14
>         addq $-32,%r12
>         jmp Main_mainzuzdszdwfold_info
>
>
> from 9 instructions in the last block down to 5 (one instruction fewer  
> than gcc).  I haven't commoned up the two constant 1's though, that'd  
> mean doing some CSE.
>
> On my machine with GHC HEAD and gcc 4.3.0, the gcc version runs in 2.0s,  
> with the NCG at 2.3s.  I put the difference down to a bit of instruction  
> scheduling done by gcc, and that extra constant load.
>
> But let's face it, all of this code is crappy.  It should be a tiny  
> little loop rather than a tail-call with argument passing, and that's  
> what we'll get with the new backend (eventually).  LLVM probably won't  
> turn it into a loop on its own, that needs to be done before the code  
> gets passed to LLVM.

Agreed. Ideally the new backend would be (starting to be?) usable about
the time -fvia-C dies? Otherwise there's always going to be something
that gcc spots that the current codegen won't.

Then again, killing perl from the ghc toolchain, and having a
funeral/dancing on its grave, would be satisfying in itself :-)

> Have you looked at this example on x86?  It's *far* worse and runs about  
> 5 times slower.

x86 scares me.. :)