Better calling conventions for strict functions (bang patterns)?
Ryan Newton
rrnewton at gmail.com
Mon Nov 2 14:37:13 UTC 2015
Thanks Simon for linking that issue! Does the patch linked there
<https://ghc.haskell.org/trac/ghc/attachment/ticket/8905/0001-EXPERIMENTAL-save-the-continuation-after-the-tag-che.patch>
already
cover what you suggested in your last mail? I think no, it's a more
limited change, but I'm having trouble understanding exactly what.
I've also got one really basic question -- was it decided long ago that all
these stack limit checks are cheaper than having a guard page at the end of
each stack and faulting to grow the stack? (I couldn't find a place that
this rationale was described on wiki.)
Best,
-Ryan
On Sun, Nov 1, 2015 at 10:05 AM, Simon Marlow <marlowsd at gmail.com> wrote:
> Yes, I think we can probably do a better job of compiling case
> expressions. The current compilation strategy optimises for code size, but
> at the expense of performance in the fast path. We can tweak this
> tradeoff, perhaps under the control of a flag.
>
> Ideally the sequence should start by assuming that the closure is already
> evaluated, e.g.
>
> loop:
> tag = R2 & 7;
> if (tag == 1) then // code for []
> else if (tag == 2) then // code for (:)
> else evaluate; jump back to loop
>
> The nice thing is that now that we don't need proc points, "loop" is just
> a label that we can directly jump to. Unfortunately this only works when
> using the NCG, not with LLVM, because LLVM requires proc points, so we
> might need to fall back to a different strategy for LLVM.
>
> Similar topics came up here: https://ghc.haskell.org/trac/ghc/ticket/8905
> and I think there was another ticket but I can't find it now.
>
> Cheers
> Simon
>
> On 23/10/2015 19:00, Ryan Newton wrote:
>
>> 1. Small tweaks: The CMM code above seems to be /betting/ than the
>> thunk is unevaluated, because it does the stack check and stack
>> write /before/ the predicate test that checks if the thunk is
>> evaluated (if(R1 & 7!= 0) gotoc3aO; elsegotoc3aP;). With a
>> bang-pattern function, couldn't it make the opposite bet? That
>> is, branch on whether the thunk is evaluated first, and then the
>> wasted computation is only a single correctly predicted branch
>> (and a read of a tag that we need to read anyway).
>>
>> Oh, a small further addition would be needed for this tweak. In the
>> generated code above "Sp = Sp + 8;" happens /late/, but I think it could
>> happen right after the call to the thunk. In general, does it seem
>> feasible to separate the slowpath from fastpath as in the following
>> tweak of the example CMM?
>>
>>
>> * // Skip to the chase if it's already evaluated:*
>> * start:*
>> * if (R2 & 7 != 0) goto fastpath; else goto slowpath;*
>> *
>> *
>> * slowpath: // Formerly c3aY*
>> * if ((Sp + -8) < SpLim) goto c3aZ; else goto c3b0;*
>> * c3aZ:*
>> * // nop*
>> * R1 = PicBaseReg + foo_closure;*
>> * call (I64[BaseReg - 8])(R2, R1) args: 8, res: 0, upd: 8;*
>> * c3b0:*
>> * I64[Sp - 8] = PicBaseReg + block_c3aO_info;*
>> * R1 = R2;*
>> * Sp = Sp - 8;*
>>
>> * call (I64[R1])(R1) returns to fastpath, args: 8, res: 8, upd: 8;*
>> * // Sp bump moved to here so it's separate from "fastpath"*
>> * Sp = Sp + 8;*
>> *
>> *
>> * fastpath: // Formerly c3aO*
>> * if (R1 & 7 >= 2) goto c3aW; else goto c3aX;*
>> * c3aW:*
>> * R1 = P64[R1 + 6] & (-8);*
>> * call (I64[R1])(R1) args: 8, res: 0, upd: 8;*
>> * c3aX:*
>> * R1 = PicBaseReg + lvl_r39S_closure;*
>> * call (I64[R1])(R1) args: 8, res: 0, upd: 8;*
>>
>>
>>
>>
>>
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20151102/53be3b2e/attachment-0001.html>
More information about the ghc-devs
mailing list