[GHC] #9660: unnecessary indirect jump when returning a case scrutinee

Fri Oct 3 03:08:12 UTC 2014

#9660: unnecessary indirect jump when returning a case scrutinee
-------------------------------------+-------------------------------------
       Reporter:  rwbarton           |                   Owner:
           Type:  bug                |                  Status:  new
       Priority:  normal             |               Milestone:
      Component:  Compiler           |                 Version:  7.8.3
  (CodeGen)                          |        Operating System:
       Keywords:                     |  Unknown/Multiple
   Architecture:  Unknown/Multiple   |         Type of failure:  Runtime
     Difficulty:  Unknown            |  performance bug
     Blocked By:                     |               Test Case:
Related Tickets:                     |                Blocking:
                                     |  Differential Revisions:
-------------------------------------+-------------------------------------
 I happened to be looking at the Cmm for this code (ghc 7.8.3, -O2)

 {{{#!hs
 f :: Int -> Int
 f x = if x < 0 then x else x+1
 }}}

 and I noticed something a bit funny about it:

 {{{
        c12e:
            if ((Sp + -8) < SpLim) goto c12z; else goto c12A;
        c12z:
            R2 = R2;
            R1 = Test.f_closure;
            call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;
        c12A:
            I64[Sp - 8] = c12b;
            R1 = R2;
            Sp = Sp - 8;
            if (R1 & 7 != 0) goto c12b; else goto c12c;
        c12c:
            call (I64[R1])(R1) returns to c12b, args: 8, res: 8, upd: 8;
        c12b:
            Hp = Hp + 16;
            if (Hp > HpLim) goto c12y; else goto c12x;
        c12y:
            HpAlloc = 16;
            R1 = R1;
            call stg_gc_unpt_r1(R1) returns to c12b, args: 8, res: 8, upd:
 8;
        c12x:
            _s11Q::I64 = I64[R1 + 7];
            if (%MO_S_Lt_W64(_s11Q::I64, 0)) goto c12u; else goto c12v;
        c12u:
            Hp = Hp - 16;
            R1 = R1 & (-8);                              /* <--- */
            Sp = Sp + 8;
            call (I64[R1])(R1) args: 8, res: 0, upd: 8;  /* <--- */
        c12v:
            I64[Hp - 8] = GHC.Types.I#_con_info;
            I64[Hp] = _s11Q::I64 + 1;
            R1 = Hp - 7;
            Sp = Sp + 8;
            call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
 }}}

 On the two marked lines, we untag R1 (which is `x`) and enter it. However,
 we know at this point that `x` is already in WHNF so we could simply
 return it by replacing the two lines with `call (P64[Sp])(R1)`, if I'm not
 mistaken. That will save a load and an indirect jump (which we actually
 know is to `I#_con_info`, which would just retag R1 and return to the
 address on the stack anyways).

 I think the same optimization should be available any time we do an
 algebraic `case` and in a branch simply return the scrutinee.

 I looked at what it would take to fix this. It looks almost easy: if we
 add a new `LambdaFormInfo` constructor `LFUnknownCon` meaning that we know
 the identifier is bound to a saturated application of an unknown
 constructor, then we could set the `cg_lf` of the case binder variable of
 an algebraic case statement to `LFUnknownCon`, and return `ReturnIt` for
 `LFUnknownCon` variables in `getCallMethod`. I think that would do it.
 Does that sound right? Is there a better way?

 (In my original example we actually know the constructor has to be `I#`.
 But if the case was on a type with more than one constructor we wouldn't
 know statically which one we got, just that it has to be one of them.)

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9660>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler