Better calling conventions for strict functions (bang patterns)?

Simon Marlow marlowsd at gmail.com
Mon Nov 2 14:58:26 UTC 2015


Sorry, forgot to respond to your other point:

On 02/11/2015 06:37, Ryan Newton wrote:
> Thanks Simon for linking that issue!  Does the patch linked there
> <https://ghc.haskell.org/trac/ghc/attachment/ticket/8905/0001-EXPERIMENTAL-save-the-continuation-after-the-tag-che.patch> already
> cover what you suggested in your last mail?  I think no, it's a more
> limited change, but I'm having trouble understanding exactly what.

Correct, it's a more limited change.  I don't have a working 
implementation of the scheme I described, it needs some reorganisation 
in the codeGen if I remember correctly.

Something I think we should do is to have a gcc-like -Os flag that we 
could use as a hint in cases where we have a code-size vs speed tradeoff 
like this, there are a handful of places we could use that.

Cheers
Simon


> I've also got one really basic question -- was it decided long ago that
> all these stack limit checks are cheaper than having a guard page at the
> end of each stack and faulting to grow the stack?  (I couldn't find a
> place that this rationale was described on wiki.)
>
> Best,
>    -Ryan
>
>
> On Sun, Nov 1, 2015 at 10:05 AM, Simon Marlow <marlowsd at gmail.com
> <mailto:marlowsd at gmail.com>> wrote:
>
>     Yes, I think we can probably do a better job of compiling case
>     expressions.  The current compilation strategy optimises for code
>     size, but at the expense of performance in the fast path.  We can
>     tweak this tradeoff, perhaps under the control of a flag.
>
>     Ideally the sequence should start by assuming that the closure is
>     already evaluated, e.g.
>
>       loop:
>         tag = R2 & 7;
>         if (tag == 1) then // code for []
>         else if (tag == 2) then // code for (:)
>         else evaluate; jump back to loop
>
>     The nice thing is that now that we don't need proc points, "loop" is
>     just a label that we can directly jump to.  Unfortunately this only
>     works when using the NCG, not with LLVM, because LLVM requires proc
>     points, so we might need to fall back to a different strategy for LLVM.
>
>     Similar topics came up here:
>     https://ghc.haskell.org/trac/ghc/ticket/8905 and I think there was
>     another ticket but I can't find it now.
>
>     Cheers
>     Simon
>
>     On 23/10/2015 19:00, Ryan Newton wrote:
>
>               1. Small tweaks: The CMM code above seems to be /betting/
>         than the
>                  thunk is unevaluated, because it does the stack check
>         and stack
>                  write /before/ the predicate test that checks if the
>         thunk is
>                  evaluated (if(R1 & 7!= 0) gotoc3aO; elsegotoc3aP;).  With a
>                  bang-pattern function, couldn't it make the opposite
>         bet?  That
>                  is, branch on whether the thunk is evaluated first, and
>         then the
>                  wasted computation is only a single correctly predicted
>         branch
>                  (and a read of a tag that we need to read anyway).
>
>         Oh, a small further addition would be needed for this tweak.  In the
>         generated code above "Sp = Sp + 8;" happens /late/, but I think
>         it could
>         happen right after the call to the thunk.  In general, does it seem
>         feasible to separate the slowpath from fastpath as in the following
>         tweak of the example CMM?
>
>
>         *  // Skip to the chase if it's already evaluated:*
>         *  start:*
>         *      if (R2 & 7 != 0) goto fastpath; else goto slowpath;*
>         *
>         *
>         *  slowpath:   // Formerly c3aY*
>         *      if ((Sp + -8) < SpLim) goto c3aZ; else goto c3b0;*
>         *  c3aZ:*
>         *      // nop*
>         *      R1 = PicBaseReg + foo_closure;*
>         *      call (I64[BaseReg - 8])(R2, R1) args: 8, res: 0, upd: 8;*
>         *  c3b0:*
>         *      I64[Sp - 8] = PicBaseReg + block_c3aO_info;*
>         *      R1 = R2;*
>         *      Sp = Sp - 8;*
>
>         *      call (I64[R1])(R1) returns to fastpath, args: 8, res: 8,
>         upd: 8;*
>         *      // Sp bump moved to here so it's separate from "fastpath"*
>         *      Sp = Sp + 8;*
>         *
>         *
>         *  fastpath: // Formerly c3aO*
>         *      if (R1 & 7 >= 2) goto c3aW; else goto c3aX;*
>         *  c3aW:*
>         *      R1 = P64[R1 + 6] & (-8);*
>         *      call (I64[R1])(R1) args: 8, res: 0, upd: 8;*
>         *  c3aX:*
>         *      R1 = PicBaseReg + lvl_r39S_closure;*
>         *      call (I64[R1])(R1) args: 8, res: 0, upd: 8;*
>
>
>
>
>
>         _______________________________________________
>         ghc-devs mailing list
>         ghc-devs at haskell.org <mailto:ghc-devs at haskell.org>
>         http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>


More information about the ghc-devs mailing list