Better calling conventions for strict functions (bang patterns)?

Carter Schonwald carter.schonwald at gmail.com
Sat Oct 24 20:53:38 UTC 2015


Doesn't modern hardware have pretty good branch prediction? In which case
the order of the branches may not matter unless it's a long chain of calls?
Vs say an inner loop that hasn't been inlined?

Either way, I'd love be stay in the loop on this topic, for work I'm
building a strongly normalizing language that supports both strict and call
by need evaluation strategies.

On Friday, October 23, 2015, Ryan Newton <rrnewton at gmail.com> wrote:

>
>>    1. Small tweaks: The CMM code above seems to be *betting* than the
>>    thunk is unevaluated, because it does the stack check and stack write
>>    *before* the predicate test that checks if the thunk is evaluated (if
>>    (R1 & 7 != 0) goto c3aO; else goto c3aP;).  With a bang-pattern
>>    function, couldn't it make the opposite bet?  That is, branch on whether
>>    the thunk is evaluated first, and then the wasted computation is only a
>>    single correctly predicted branch (and a read of a tag that we need to read
>>    anyway).
>>
>> Oh, a small further addition would be needed for this tweak.  In the
> generated code above "Sp = Sp + 8;" happens *late*, but I think it could
> happen right after the call to the thunk.  In general, does it seem
> feasible to separate the slowpath from fastpath as in the following tweak
> of the example CMM?
>
>
> *  // Skip to the chase if it's already evaluated:*
> *  start:*
> *      if (R2 & 7 != 0) goto fastpath; else goto slowpath;*
>
> *  slowpath:   // Formerly c3aY*
> *      if ((Sp + -8) < SpLim) goto c3aZ; else goto c3b0;*
> *  c3aZ:*
> *      // nop*
> *      R1 = PicBaseReg + foo_closure;*
> *      call (I64[BaseReg - 8])(R2, R1) args: 8, res: 0, upd: 8;*
> *  c3b0:*
> *      I64[Sp - 8] = PicBaseReg + block_c3aO_info;*
> *      R1 = R2;*
> *      Sp = Sp - 8;*
>
> *      call (I64[R1])(R1) returns to fastpath, args: 8, res: 8, upd: 8;*
> *      // Sp bump moved to here so it's separate from "fastpath"*
> *      Sp = Sp + 8;*
>
> *  fastpath: // Formerly c3aO*
> *      if (R1 & 7 >= 2) goto c3aW; else goto c3aX;*
> *  c3aW:*
> *      R1 = P64[R1 + 6] & (-8);*
> *      call (I64[R1])(R1) args: 8, res: 0, upd: 8;*
> *  c3aX:*
> *      R1 = PicBaseReg + lvl_r39S_closure;*
> *      call (I64[R1])(R1) args: 8, res: 0, upd: 8;*
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20151024/6d5b03a6/attachment.html>


More information about the ghc-devs mailing list