Better calling conventions for strict functions (bang patterns)?
Carter Schonwald
carter.schonwald at gmail.com
Sat Oct 24 20:53:38 UTC 2015
Doesn't modern hardware have pretty good branch prediction? In which case
the order of the branches may not matter unless it's a long chain of calls?
Vs say an inner loop that hasn't been inlined?
Either way, I'd love be stay in the loop on this topic, for work I'm
building a strongly normalizing language that supports both strict and call
by need evaluation strategies.
On Friday, October 23, 2015, Ryan Newton <rrnewton at gmail.com> wrote:
>
>> 1. Small tweaks: The CMM code above seems to be *betting* than the
>> thunk is unevaluated, because it does the stack check and stack write
>> *before* the predicate test that checks if the thunk is evaluated (if
>> (R1 & 7 != 0) goto c3aO; else goto c3aP;). With a bang-pattern
>> function, couldn't it make the opposite bet? That is, branch on whether
>> the thunk is evaluated first, and then the wasted computation is only a
>> single correctly predicted branch (and a read of a tag that we need to read
>> anyway).
>>
>> Oh, a small further addition would be needed for this tweak. In the
> generated code above "Sp = Sp + 8;" happens *late*, but I think it could
> happen right after the call to the thunk. In general, does it seem
> feasible to separate the slowpath from fastpath as in the following tweak
> of the example CMM?
>
>
> * // Skip to the chase if it's already evaluated:*
> * start:*
> * if (R2 & 7 != 0) goto fastpath; else goto slowpath;*
>
> * slowpath: // Formerly c3aY*
> * if ((Sp + -8) < SpLim) goto c3aZ; else goto c3b0;*
> * c3aZ:*
> * // nop*
> * R1 = PicBaseReg + foo_closure;*
> * call (I64[BaseReg - 8])(R2, R1) args: 8, res: 0, upd: 8;*
> * c3b0:*
> * I64[Sp - 8] = PicBaseReg + block_c3aO_info;*
> * R1 = R2;*
> * Sp = Sp - 8;*
>
> * call (I64[R1])(R1) returns to fastpath, args: 8, res: 8, upd: 8;*
> * // Sp bump moved to here so it's separate from "fastpath"*
> * Sp = Sp + 8;*
>
> * fastpath: // Formerly c3aO*
> * if (R1 & 7 >= 2) goto c3aW; else goto c3aX;*
> * c3aW:*
> * R1 = P64[R1 + 6] & (-8);*
> * call (I64[R1])(R1) args: 8, res: 0, upd: 8;*
> * c3aX:*
> * R1 = PicBaseReg + lvl_r39S_closure;*
> * call (I64[R1])(R1) args: 8, res: 0, upd: 8;*
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20151024/6d5b03a6/attachment.html>
More information about the ghc-devs
mailing list