Transparently hooking into the STG stack to validate an escape analysis

Thu Dec 16 17:22:30 UTC 2021

Thanks for the feedback!
Let's have a video meeting. My schedule is flexible. What time is best for
you?

Cheers,
Csaba

On Thu, Dec 16, 2021 at 5:29 PM Sebastian Graf <sgraf1337 at gmail.com> wrote:

> Hey Csaba,
>
> After catching up on your talk and reflecting about it a bit, it seems
> quite obvious that your tool is the right way to collect the data and
> validate our analysis!
> Even if meanwhile we decided that a "transparent stack frame" (which I
> believe is something similar to what you are doing here
> <https://github.com/grin-compiler/ghc-whole-program-compiler-project/blob/d3ed07d9f3167ad1afa01d1aa95aec2472b2708f/external-stg-interpreter/lib/Stg/Interpreter.hs#L130>,
> with an explicit `argCount` which we do not know) is not the ideal solution
> we've been looking for (for different reasons).
>
> Essentially, we have two things
>
>    1. An escape analysis, implemented as an STG-to-STG pass that attaches
>    a boolean flag to each Id whether it "escapes its scope" (for a suitable
>    definition of that).
>    We'd like to implement it in a way that it would be reusable within
>    GHC with moderate effort (e.g., renaming Binder to Id or accounting for
>    different fields), operating on a module at a time rather than whole
>    program.
>    2. The instrumentation that tries to measure how many heap objects
>    could be allocated on the stack. E.g., set a closure-specific flag whenever
>    the closure is entered, unset that bit (once) when we "leave the scope"
>    that defines the closure.
>    If my understanding is right, we could just implement this
>    "instrumentation" as a simple extra field to Closure
>    <https://github.com/grin-compiler/ghc-whole-program-compiler-project/blob/d3ed07d9f3167ad1afa01d1aa95aec2472b2708f/external-stg-interpreter/lib/Stg/Interpreter/Base.hs#L135>,
>    right? Neat!
>
> A bit tangential: I see that your interpreter currently allocates a fresh
> closure for let-no-escapes
> <https://github.com/grin-compiler/ghc-whole-program-compiler-project/blob/d3ed07d9f3167ad1afa01d1aa95aec2472b2708f/external-stg-interpreter/lib/Stg/Interpreter.hs#L606>
> when it could just re-use the closure of its defining scope. That would
> skew our numbers somewhat compared to instrumenting GHC-compiled programs,
> but I think we'd be able to work around that. I also wonder if the
> semantics of your let-no-escapes are actually as typically specified
> (operationally), in that a jump to a let-no-escape should also reset the
> stack pointer. It should hardly matter for the programs that GHC generates,
> though.
>
> I would also be interested in knowing whether the +RTS -s "bytes allocated
> in the heap" metric (very nearly) coincides with a similar metric you could
> produce. It would be fantastic if that was the case! Theoretically, that
> should be possible, right?
>
> I think your interpreter work is very valuable to collect data we
> otherwise would only be able to measure with a TickyTicky-based approach.
> Nice!
> Another, similar use case would be to identify the fraction of closures
> that are only entered once. I remember that there was a ticky-based patch
> with which Joachim used to measure this fraction
> <https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/demand#instrumentation>
> (and similarly validate the analysis results), but unfortunately it
> couldn't end up in master. Ah, yes, we have a whole open ticket about it:
> #10613 <https://gitlab.haskell.org/ghc/ghc/-/issues/10613>. In fact, that
> instrumentation is also somewhat similar (adding a field to every closure)
> as what we want to do.
>
> Anyway, it seems like your work will be very valuable in replacing some of
> the annoying ticky-based instrumentation ideas!
>
> Maybe we can have a call some time this or next week to discuss details,
> once Sebastian and I are more familiar with the code base?
>
> Thanks for sticking with the project and doing all the hard work that can
> build upon!
> Sebastian
>
> ------ Originalnachricht ------
> Von: "Csaba Hruska" <csaba.hruska at gmail.com>
> An: "Sebastian Graf" <sgraf1337 at gmail.com>
> Cc: "ghc-devs" <ghc-devs at haskell.org>; "Sebastian Scheper" <
> sebastian.scheper at student.kit.edu>
> Gesendet: 15.12.2021 16:16:27
> Betreff: Re: Transparently hooking into the STG stack to validate an
> escape analysis
>
> Hi,
>
> IMO the Cmm STG machine implementation is just too complex for student
> projects. It's not fun to work with at all.
> Why did you choose this approach?
> IMO the escape analysis development and validation would be much smoother
> and fun when you'd use the external STG interpreter.
> When you have a solid and working design of your analysis and
> transformations then you could implement it in GHC's native backend if it
> needs any changes at all.
>
> What do you think?
> Do you disagree?
>
> Have you seen my presentation about the stg interpreter?
> https://www.youtube.com/watch?v=Ey5OFPkxF_w
>
> Cheers,
> Csaba
>
> On Wed, Dec 8, 2021 at 11:20 AM Sebastian Graf <sgraf1337 at gmail.com>
> wrote:
>
>> Hi Devs,
>>
>> my master's student Sebastian and I (also Sebastian :)) are working on an
>> escape analysis in STG, see
>> https://gitlab.haskell.org/ghc/ghc/-/issues/16891#note_347903.
>>
>> We have a prototype for the escape analysis that we want to
>> validate/exploit now.
>> The original plan was to write the transformation that allocates
>> non-escaping objects on the STG stack. But that is quite tricky for many
>> reasons, one of them being treatment of the GC.
>>
>> This mail is rather lengthy, so I suggest you skip to "What we hope could
>> work" and see if you can answer it without the context I provide below. If
>> you can't, I would be very grateful if you were willing to suffer through
>> the exposition.
>>
>> # Instrumentation
>>
>> So instead we thought about doing a (easily changed and thus versatile)
>> instrumentation-based approach:
>> Assign a sequence number to every instantiation (a term I will use to
>> mean "allocation of g's closure") of things that we our analysis
>> determines as escaping or non-escaping, such as STG's let bindings
>> (focusing on non-let-no-escape functions for now).
>> (One sequence number *per allocation* of the let binding's closure, not
>> based on its syntactic identity.)
>> Then, we set a bit in a (dynamically growing) global bit vector whenever
>> the let "RHS is entered" and then unset it when we "leave the let-body".
>> Example:
>>
>> f = \x y ->
>>   let {
>>     g = [y] \z -> y + z;
>>   } in g x
>>
>> Here, our analysis could see that no instantiation (which I say instead
>> of "allocation of g's closure") of g will ever escape its scope within f.
>> Our validation would give a fresh sequence number to the instantiation of
>> g whenever f is called and store it in g's closure (which we arrange by
>> piggy-backing on -prof and adding an additional field to the profiling
>> header).
>> Then, when g's RHS is entered, we set the bit in the global bit vector,
>> indicating "this instantiation of g might escape".
>> After leaving the RHS of g, we also leave the body of the defining let,
>> which means we unset the bit in the bit vector, meaning "every use so far
>> wasn't in an escaping scenario".
>>
>> So far so good. Modifying the code upon entering g takes a bit of
>> tinkering but can be done by building on TickyTicky in StgToCmm.
>> But what is not done so easily is inserting the continuation upon
>> entering the let that will unset the bit!
>>
>> # What doesn't work: Modifying the Sequel
>>
>> At first, we tried to modify the sequel
>> <https://gitlab.haskell.org/ghc/ghc/-/blob/4c6985cc91b9cf89b393f69c9a603e8df0aea9e0/compiler/GHC/StgToCmm/Monad.hs#L210-222> of
>> the let-body to an `AssignTo`.
>> That requires us to know the registers in which the let-body will return
>> its results, which in turn means we have to know the representation of
>> those results, so we have to write a function `stgExprPrimRep :: GenStgExpr
>> p -> [PrimRep]`.
>> Urgh! We were very surprised that there was no such function. And while
>> we tested our function, we very soon knew why. Consider the following
>> pattern synonym matcher:
>>
>> GHC.Natural.$mNatJ#
>>   :: forall {rep :: GHC.Types.RuntimeRep} {r :: TYPE rep}.
>>      GHC.Num.Natural.Natural
>>      -> (GHC.Num.BigNat.BigNat -> r) -> ((# #) -> r) -> r
>>  = {} \r [scrut_sBE cont_sBF fail_sBG]
>>         case scrut_sBE of {
>>           GHC.Num.Natural.NS _ -> fail_sBG GHC.Prim.(##);
>>           GHC.Num.Natural.NB ds_sBJ ->
>>               let {
>>                 sat_sBK :: GHC.Num.BigNat.BigNat
>>                  = CCCS GHC.Num.BigNat.BN#! [ds_sBJ];
>>               } in  cont_sBF sat_sBK;
>>         };
>>
>> Note how its result is representation-polymorphic! It only works because
>> our machine implementation allows tail-calls.
>> It's obvious in hindsight that we could never write `stgExprPrimRep` in
>> such a way that it will work on the expression `cont_sBF sat_sBK`.
>> So the sequel approach seems infeasible.
>>
>> # What we hope could work: A special stack frame
>>
>> The other alternative would be to insert a special continuation frame on
>> the stack when we enter the let-body (inspired by stg_restore_cccs).
>> This continuation frame would simply push all registers (FP regs, GP
>> regs, Vec regs, ...) to the C stack, do its work (unsetting the bit), then
>> pop all registers again and jump to the topmost continuation on the STG
>> stack.
>> Example:
>>
>> f :: forall rep (r :: TYPE rep). Int# -> (Int# -> r) -> r
>> f = \x g ->
>>   let {
>>     h = [x] \a -> x + a;
>>   } in
>>   case h x of b {
>>     __DEFAULT -> g b
>>   }
>>
>> We are only interested in unsetting the bit for h here. Consider the
>> stack when entering the body of h.
>>
>> caller_of_f_cont_info <- Sp
>>
>> Now push our special continuation frame:
>>
>> caller_of_f_cont_info
>> seq_h
>> unset_bit_stk_info <- Sp
>>
>> E.g., the stack frame contains the info pointer and the sequence number.
>> (Btw., I hope I got the stack layout about right and this is even possible)
>> Then, after we entered the continuation of the __DEFAULT alt, we do a
>> jump to g.
>> Plot twist: g returns an unboxed 8-tuple of `Int#`s (as caller_of_f_cont_info
>> knows, but f certainly doesn't!), so before it returns it will push two
>> args on the stack (correct?):
>>
>> caller_of_f_cont_info
>> seq_h
>> unset_bit_stk_info
>> unboxed tuple component 7
>> unboxed tuple component 8 <- Sp
>>
>> And then `g` jumps directly to the entry code for `unset_bit_stk_info`
>> (which does the register saving I mentioned), which absolutely can't figure
>> out from Sp alone where seq_h is.
>> Darn! I think Luite's recent work on the StgToByteCode had to work around
>> similar issues, I found this wiki page
>> <https://gitlab.haskell.org/ghc/ghc-wiki-mirror/-/blob/master/Unboxed-Tuples-and-Sums-in-GHCi-bytecode.md#returning-a-tuple>.
>> But we aren't in a position where we know the representation of `r` *at
>> all*!
>>
>> So our idea was to scan the stack, beginning from `Sp`, until I find
>> `unset_bit_stk_info`, giving us the following situation:
>>
>> caller_of_f_cont_info
>> seq_h
>> unset_bit_stk_info <- Bp
>> unboxed tuple component 7
>> unboxed tuple component 8 <- Sp
>>
>> I suggestively named the register in which we store the result Bp in
>> analogy to the traditional base pointer. This information would be enough
>> to unset the bit at index seq_h
>> and then copy the unboxed tuple components 7 and 8 up by two words:
>>
>> caller_of_f_cont_info
>> unboxed tuple component 7
>> unboxed tuple component 8 <- Sp
>>
>> Then jump to caller_of_f_cont_info, which knows what to make of the
>> stack and the register state.
>>
>> The stack scanning is horrible and too brittle and slow for a production
>> setting, as any of the unboxed tuple components could have the same bit
>> pattern as `unset_bit_stk_info`.
>> We simply hope that is never the case and it's fine for the purposes of a
>> quick-and-dirty instrumentation.
>>
>> QUESTION 1: What do you think? Could this work? Can you anticipate pit
>> falls of this approach?
>>
>> # What about Thunks and StgRhsCon?
>>
>> QUESTION 2: The instrumentation as described won't work for Thunks (which
>> are only entered once) and constructor applications (like sat_sBK in the
>> BigNat matcher above). Not sure how to change that without incurring huge
>> performance hits (by deactivating memoisation). Ideas are welcome here.
>>
>> Thanks for reading this far. I hope you could follow and are now fizzing
>> with excitement because you have a much better idea of how to do this and
>> will let me know :)
>>
>> Sebastian
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20211216/750306ff/attachment.html>