[GHC] #9660: unnecessary indirect jump when returning a case scrutinee
GHC
ghc-devs at haskell.org
Fri Oct 3 03:08:12 UTC 2014
#9660: unnecessary indirect jump when returning a case scrutinee
-------------------------------------+-------------------------------------
Reporter: rwbarton | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.8.3
(CodeGen) | Operating System:
Keywords: | Unknown/Multiple
Architecture: Unknown/Multiple | Type of failure: Runtime
Difficulty: Unknown | performance bug
Blocked By: | Test Case:
Related Tickets: | Blocking:
| Differential Revisions:
-------------------------------------+-------------------------------------
I happened to be looking at the Cmm for this code (ghc 7.8.3, -O2)
{{{#!hs
f :: Int -> Int
f x = if x < 0 then x else x+1
}}}
and I noticed something a bit funny about it:
{{{
c12e:
if ((Sp + -8) < SpLim) goto c12z; else goto c12A;
c12z:
R2 = R2;
R1 = Test.f_closure;
call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;
c12A:
I64[Sp - 8] = c12b;
R1 = R2;
Sp = Sp - 8;
if (R1 & 7 != 0) goto c12b; else goto c12c;
c12c:
call (I64[R1])(R1) returns to c12b, args: 8, res: 8, upd: 8;
c12b:
Hp = Hp + 16;
if (Hp > HpLim) goto c12y; else goto c12x;
c12y:
HpAlloc = 16;
R1 = R1;
call stg_gc_unpt_r1(R1) returns to c12b, args: 8, res: 8, upd:
8;
c12x:
_s11Q::I64 = I64[R1 + 7];
if (%MO_S_Lt_W64(_s11Q::I64, 0)) goto c12u; else goto c12v;
c12u:
Hp = Hp - 16;
R1 = R1 & (-8); /* <--- */
Sp = Sp + 8;
call (I64[R1])(R1) args: 8, res: 0, upd: 8; /* <--- */
c12v:
I64[Hp - 8] = GHC.Types.I#_con_info;
I64[Hp] = _s11Q::I64 + 1;
R1 = Hp - 7;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
}}}
On the two marked lines, we untag R1 (which is `x`) and enter it. However,
we know at this point that `x` is already in WHNF so we could simply
return it by replacing the two lines with `call (P64[Sp])(R1)`, if I'm not
mistaken. That will save a load and an indirect jump (which we actually
know is to `I#_con_info`, which would just retag R1 and return to the
address on the stack anyways).
I think the same optimization should be available any time we do an
algebraic `case` and in a branch simply return the scrutinee.
I looked at what it would take to fix this. It looks almost easy: if we
add a new `LambdaFormInfo` constructor `LFUnknownCon` meaning that we know
the identifier is bound to a saturated application of an unknown
constructor, then we could set the `cg_lf` of the case binder variable of
an algebraic case statement to `LFUnknownCon`, and return `ReturnIt` for
`LFUnknownCon` variables in `getCallMethod`. I think that would do it.
Does that sound right? Is there a better way?
(In my original example we actually know the constructor has to be `I#`.
But if the case was on a type with more than one constructor we wouldn't
know statically which one we got, just that it has to be one of them.)
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9660>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list