performance issues in simple arithmetic code

Thu Apr 28 10:30:45 CEST 2011

On 27 April 2011 20:01, Denys Rtveliashvili <rtvd at mac.com> wrote:
> The lack of expected magic is in the assembler code:
> -------------------
>
>     addq $16,%r12
>     cmpq 144(%r13),%r12
>     ja .Lcz1
>     movl $1117,%ecx
>     movl $1113,%r10d
>     movl $1111,%r11d
>     movq 7(%rbx),%rax
>     cqto
>     idivq %r11
>     cqto
>     idivq %r10
>     cqto
>     idivq %rcx
>     movq $ghczmprim_GHCziTypes_Izh_con_info,-8(%r12)
>     movq %rax,0(%r12)
>     leaq -7(%r12),%rbx
>     addq $8,%rbp
>     jmp *0(%rbp)
>
> -------------------
> Question: can't it use cheap multiplication and shift instead of expensive
> division here? I know that such optimisation is implemented at least to some
> extent for C--. I suppose it also won't do anything smart for expressions
> like a*4 or a/4 for the same reason.

There isn't really any optimisation done on Cmm and the native code
generator doesn't do much optimisation itself, hence you get the more
direct forward translation. This kind of code is where the LLVM
backend does well in comparison. I haven't tried benchmarking the
performance of -fasm vs -fllvm for this code but if you eyeball the
assembly code produced by -fllvm then you'll see it uses shifts and
other magic.

Cheers,
David