performance issues in simple arithmetic code
Edward Z. Yang
ezyang at MIT.EDU
Thu Apr 28 15:45:51 CEST 2011
Excerpts from Denys Rtveliashvili's message of Thu Apr 28 04:41:48 -0400 2011:
> Well.. I found some places in C-- compiler which are supposed to convert
> division and multiplication by 2^n into shifts. And I believe these work
> sometimes.
>
> However in this case I am a bit puzzled because even if I change the
> constants in my example to 2^n like 1024 the code is not optimised.
You are referring to the mini-optimizer in cmm/CmmOpt.hs, correct?
Specifically:
cmmMachOpFold mop args@[x, y@(CmmLit (CmmInt n _))]
= case mop of
MO_Mul rep
| Just p <- exactLog2 n ->
cmmMachOpFold (MO_Shl rep) [x, CmmLit (CmmInt p rep)]
MO_U_Quot rep
| Just p <- exactLog2 n ->
cmmMachOpFold (MO_U_Shr rep) [x, CmmLit (CmmInt p rep)]
MO_S_Quot rep
| Just p <- exactLog2 n,
CmmReg _ <- x -> -- We duplicate x below, hence require
See the third case. This appears to be something of a delicate special case,
in particular, the incoming argument is required to be a register, which is not
the case in many instances:
sef_ret()
{ [const 0;, const 34;]
}
ceq:
Hp = Hp + 8;
if (Hp > I32[BaseReg + 92]) goto ceu;
_seg::I32 = %MO_S_Quot_W32(I32[R1 + 3], 1024); <-- oops, it's a memory load
I32[Hp - 4] = GHC.Types.I#_con_info;
I32[Hp] = _seg::I32;
R1 = Hp - 3;
Sp = Sp + 4;
jump I32[Sp] ();
ceu:
I32[BaseReg + 112] = 8;
jump (I32[BaseReg - 8]) ();
}
(This is optimized Cmm, which you can get with -ddump-opt-cmm).
Multiplication, on the other hand, manages to pull it off more frequently:
sef_ret()
{ [const 0;, const 34;]
}
ceq:
Hp = Hp + 8;
if (Hp > I32[BaseReg + 92]) goto ceu;
_seg::I32 = I32[R1 + 3] << 10;
I32[Hp - 4] = GHC.Types.I#_con_info;
I32[Hp] = _seg::I32;
R1 = Hp - 3;
Sp = Sp + 4;
jump I32[Sp] ();
ceu:
I32[BaseReg + 112] = 8;
jump (I32[BaseReg - 8]) ();
}
This might be a poor interaction with the inliner. I haven't investigated fully though.
> By the way, is there any kind of documentation on how to hack C-- compiler?
> In particular, I am interested in:
> * how to run its optimiser against some C-- code and see what does it do
> * knowing more about its internals
GHC supports compiling C-- code; just name your file with a .cmm extension
and GHC will parse it and, if it's the native backend, do some minor
optimizations and register allocation.
As usual, the GHC Trac has some useful information:
- http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/CmmType
- http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Backends/NCG
I also highly recommend reading cmm/OldCmm.hs and cmm/CmmExpr.hs, which explain
the internal AST we use for Cmm, as well as cmm/OldCmmPpr.hs and cmm/CmmParse.y (and
cmm/CmmLex.x) to understand textual C--. Note that there is also a "new" C--
representation hanging around that is not too interesting for you, since we don't
use it at all without the flag -fnew-codegen.
Edward
More information about the Glasgow-haskell-users
mailing list