Transparently hooking into the STG stack to validate an escape analysis
Sebastian Graf
sgraf1337 at gmail.com
Fri Dec 17 09:24:38 UTC 2021
(Moving the conversation off the list.)
------ Originalnachricht ------
Von: "Csaba Hruska" <csaba.hruska at gmail.com>
An: "Sebastian Graf" <sgraf1337 at gmail.com>
Cc: "ghc-devs" <ghc-devs at haskell.org>; "Sebastian Scheper"
<sebastian.scheper at student.kit.edu>
Gesendet: 16.12.2021 18:22:30
Betreff: Re: Re[2]: Transparently hooking into the STG stack to validate
an escape analysis
>Thanks for the feedback!
>Let's have a video meeting. My schedule is flexible. What time is best
>for you?
>
>Cheers,
>Csaba
>
>On Thu, Dec 16, 2021 at 5:29 PM Sebastian Graf <sgraf1337 at gmail.com>
>wrote:
>>Hey Csaba,
>>
>>After catching up on your talk and reflecting about it a bit, it seems
>>quite obvious that your tool is the right way to collect the data and
>>validate our analysis!
>>Even if meanwhile we decided that a "transparent stack frame" (which I
>>believe is something similar to what you are doing here
>><https://github.com/grin-compiler/ghc-whole-program-compiler-project/blob/d3ed07d9f3167ad1afa01d1aa95aec2472b2708f/external-stg-interpreter/lib/Stg/Interpreter.hs#L130>,
>>with an explicit `argCount` which we do not know) is not the ideal
>>solution we've been looking for (for different reasons).
>>
>>Essentially, we have two things
>>An escape analysis, implemented as an STG-to-STG pass that attaches a
>>boolean flag to each Id whether it "escapes its scope" (for a suitable
>>definition of that).
>>We'd like to implement it in a way that it would be reusable within
>>GHC with moderate effort (e.g., renaming Binder to Id or accounting
>>for different fields), operating on a module at a time rather than
>>whole program.The instrumentation that tries to measure how many heap
>>objects could be allocated on the stack. E.g., set a closure-specific
>>flag whenever the closure is entered, unset that bit (once) when we
>>"leave the scope" that defines the closure.
>>If my understanding is right, we could just implement this
>>"instrumentation" as a simple extra field to Closure
>><https://github.com/grin-compiler/ghc-whole-program-compiler-project/blob/d3ed07d9f3167ad1afa01d1aa95aec2472b2708f/external-stg-interpreter/lib/Stg/Interpreter/Base.hs#L135>,
>>right? Neat!
>>A bit tangential: I see that your interpreter currently allocates a
>>fresh closure for let-no-escapes
>><https://github.com/grin-compiler/ghc-whole-program-compiler-project/blob/d3ed07d9f3167ad1afa01d1aa95aec2472b2708f/external-stg-interpreter/lib/Stg/Interpreter.hs#L606>
>>when it could just re-use the closure of its defining scope. That
>>would skew our numbers somewhat compared to instrumenting GHC-compiled
>>programs, but I think we'd be able to work around that. I also wonder
>>if the semantics of your let-no-escapes are actually as typically
>>specified (operationally), in that a jump to a let-no-escape should
>>also reset the stack pointer. It should hardly matter for the programs
>>that GHC generates, though.
>>
>>I would also be interested in knowing whether the +RTS -s "bytes
>>allocated in the heap" metric (very nearly) coincides with a similar
>>metric you could produce. It would be fantastic if that was the case!
>>Theoretically, that should be possible, right?
>>
>>I think your interpreter work is very valuable to collect data we
>>otherwise would only be able to measure with a TickyTicky-based
>>approach. Nice!
>>Another, similar use case would be to identify the fraction of
>>closures that are only entered once. I remember that there was a
>>ticky-based patch with which Joachim used to measure this fraction
>><https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/demand#instrumentation>
>>(and similarly validate the analysis results), but unfortunately it
>>couldn't end up in master. Ah, yes, we have a whole open ticket about
>>it: #10613 <https://gitlab.haskell.org/ghc/ghc/-/issues/10613>. In
>>fact, that instrumentation is also somewhat similar (adding a field to
>>every closure) as what we want to do.
>>
>>Anyway, it seems like your work will be very valuable in replacing
>>some of the annoying ticky-based instrumentation ideas!
>>
>>Maybe we can have a call some time this or next week to discuss
>>details, once Sebastian and I are more familiar with the code base?
>>
>>Thanks for sticking with the project and doing all the hard work that
>>can build upon!
>>Sebastian
>>
>>------ Originalnachricht ------
>>Von: "Csaba Hruska" <csaba.hruska at gmail.com>
>>An: "Sebastian Graf" <sgraf1337 at gmail.com>
>>Cc: "ghc-devs" <ghc-devs at haskell.org>; "Sebastian Scheper"
>><sebastian.scheper at student.kit.edu>
>>Gesendet: 15.12.2021 16:16:27
>>Betreff: Re: Transparently hooking into the STG stack to validate an
>>escape analysis
>>
>>>Hi,
>>>
>>>IMO the Cmm STG machine implementation is just too complex for
>>>student projects. It's not fun to work with at all.
>>>Why did you choose this approach?
>>>IMO the escape analysis development and validation would be much
>>>smoother and fun when you'd use the external STG interpreter.
>>>When you have a solid and working design of your analysis and
>>>transformations then you could implement it in GHC's native backend
>>>if it needs any changes at all.
>>>
>>>What do you think?
>>>Do you disagree?
>>>
>>>Have you seen my presentation about the stg interpreter?
>>>https://www.youtube.com/watch?v=Ey5OFPkxF_w
>>>
>>>Cheers,
>>>Csaba
>>>
>>>On Wed, Dec 8, 2021 at 11:20 AM Sebastian Graf <sgraf1337 at gmail.com>
>>>wrote:
>>>>Hi Devs,
>>>>
>>>>my master's student Sebastian and I (also Sebastian :)) are working
>>>>on an escape analysis in STG, see
>>>>https://gitlab.haskell.org/ghc/ghc/-/issues/16891#note_347903.
>>>>
>>>>We have a prototype for the escape analysis that we want to
>>>>validate/exploit now.
>>>>The original plan was to write the transformation that allocates
>>>>non-escaping objects on the STG stack. But that is quite tricky for
>>>>many reasons, one of them being treatment of the GC.
>>>>
>>>>This mail is rather lengthy, so I suggest you skip to "What we hope
>>>>could work" and see if you can answer it without the context I
>>>>provide below. If you can't, I would be very grateful if you were
>>>>willing to suffer through the exposition.
>>>>
>>>># Instrumentation
>>>>
>>>>So instead we thought about doing a (easily changed and thus
>>>>versatile) instrumentation-based approach:
>>>>Assign a sequence number to every instantiation (a term I will use
>>>>to mean "allocation of g's closure") of things that we our analysis
>>>>determines as escaping or non-escaping, such as STG's let bindings
>>>>(focusing on non-let-no-escape functions for now).
>>>>(One sequence number *per allocation* of the let binding's closure,
>>>>not based on its syntactic identity.)
>>>>Then, we set a bit in a (dynamically growing) global bit vector
>>>>whenever the let "RHS is entered" and then unset it when we "leave
>>>>the let-body". Example:
>>>>
>>>>f = \x y ->
>>>> let {
>>>> g = [y] \z -> y + z;
>>>> } in g x
>>>>
>>>>Here, our analysis could see that no instantiation (which I say
>>>>instead of "allocation of g's closure") of g will ever escape its
>>>>scope within f.
>>>>Our validation would give a fresh sequence number to the
>>>>instantiation of g whenever f is called and store it in g's closure
>>>>(which we arrange by piggy-backing on -prof and adding an additional
>>>>field to the profiling header).
>>>>Then, when g's RHS is entered, we set the bit in the global bit
>>>>vector, indicating "this instantiation of g might escape".
>>>>After leaving the RHS of g, we also leave the body of the defining
>>>>let, which means we unset the bit in the bit vector, meaning "every
>>>>use so far wasn't in an escaping scenario".
>>>>
>>>>So far so good. Modifying the code upon entering g takes a bit of
>>>>tinkering but can be done by building on TickyTicky in StgToCmm.
>>>>But what is not done so easily is inserting the continuation upon
>>>>entering the let that will unset the bit!
>>>>
>>>># What doesn't work: Modifying the Sequel
>>>>
>>>>At first, we tried to modify the sequel
>>>><https://gitlab.haskell.org/ghc/ghc/-/blob/4c6985cc91b9cf89b393f69c9a603e8df0aea9e0/compiler/GHC/StgToCmm/Monad.hs#L210-222>
>>>>of the let-body to an `AssignTo`.
>>>>That requires us to know the registers in which the let-body will
>>>>return its results, which in turn means we have to know the
>>>>representation of those results, so we have to write a function
>>>>`stgExprPrimRep :: GenStgExpr p -> [PrimRep]`.
>>>>Urgh! We were very surprised that there was no such function. And
>>>>while we tested our function, we very soon knew why. Consider the
>>>>following pattern synonym matcher:
>>>>
>>>>GHC.Natural.$mNatJ#
>>>> :: forall {rep :: GHC.Types.RuntimeRep} {r :: TYPE rep}.
>>>> GHC.Num.Natural.Natural
>>>> -> (GHC.Num.BigNat.BigNat -> r) -> ((# #) -> r) -> r
>>>> = {} \r [scrut_sBE cont_sBF fail_sBG]
>>>> case scrut_sBE of {
>>>> GHC.Num.Natural.NS _ -> fail_sBG GHC.Prim.(##);
>>>> GHC.Num.Natural.NB ds_sBJ ->
>>>> let {
>>>> sat_sBK :: GHC.Num.BigNat.BigNat
>>>> = CCCS GHC.Num.BigNat.BN#! [ds_sBJ];
>>>> } in cont_sBF sat_sBK;
>>>> };
>>>>
>>>>Note how its result is representation-polymorphic! It only works
>>>>because our machine implementation allows tail-calls.
>>>>It's obvious in hindsight that we could never write `stgExprPrimRep`
>>>>in such a way that it will work on the expression `cont_sBF
>>>>sat_sBK`.
>>>>So the sequel approach seems infeasible.
>>>>
>>>># What we hope could work: A special stack frame
>>>>
>>>>The other alternative would be to insert a special continuation
>>>>frame on the stack when we enter the let-body (inspired by
>>>>stg_restore_cccs).
>>>>This continuation frame would simply push all registers (FP regs, GP
>>>>regs, Vec regs, ...) to the C stack, do its work (unsetting the
>>>>bit), then pop all registers again and jump to the topmost
>>>>continuation on the STG stack.
>>>>Example:
>>>>
>>>>f :: forall rep (r :: TYPE rep). Int# -> (Int# -> r) -> r
>>>>f = \x g ->
>>>> let {
>>>> h = [x] \a -> x + a;
>>>> } in
>>>> case h x of b {
>>>> __DEFAULT -> g b
>>>> }
>>>>
>>>>We are only interested in unsetting the bit for h here. Consider the
>>>>stack when entering the body of h.
>>>>
>>>>caller_of_f_cont_info <- Sp
>>>>
>>>>Now push our special continuation frame:
>>>>
>>>>caller_of_f_cont_info
>>>>seq_h
>>>>unset_bit_stk_info <- Sp
>>>>
>>>>E.g., the stack frame contains the info pointer and the sequence
>>>>number. (Btw., I hope I got the stack layout about right and this is
>>>>even possible)
>>>>Then, after we entered the continuation of the __DEFAULT alt, we do
>>>>a jump to g.
>>>>Plot twist: g returns an unboxed 8-tuple of `Int#`s (as
>>>>caller_of_f_cont_info knows, but f certainly doesn't!), so before it
>>>>returns it will push two args on the stack (correct?):
>>>>
>>>>caller_of_f_cont_info
>>>>seq_h
>>>>unset_bit_stk_info
>>>>unboxed tuple component 7
>>>>unboxed tuple component 8 <- Sp
>>>>
>>>>And then `g` jumps directly to the entry code for
>>>>`unset_bit_stk_info` (which does the register saving I mentioned),
>>>>which absolutely can't figure out from Sp alone where seq_h is.
>>>>Darn! I think Luite's recent work on the StgToByteCode had to work
>>>>around similar issues, I found this wiki page
>>>><https://gitlab.haskell.org/ghc/ghc-wiki-mirror/-/blob/master/Unboxed-Tuples-and-Sums-in-GHCi-bytecode.md#returning-a-tuple>.
>>>>But we aren't in a position where we know the representation of `r`
>>>>*at all*!
>>>>
>>>>So our idea was to scan the stack, beginning from `Sp`, until I find
>>>>`unset_bit_stk_info`, giving us the following situation:
>>>>
>>>>caller_of_f_cont_info
>>>>seq_h
>>>>unset_bit_stk_info <- Bp
>>>>unboxed tuple component 7
>>>>unboxed tuple component 8 <- Sp
>>>>
>>>>I suggestively named the register in which we store the result Bp in
>>>>analogy to the traditional base pointer. This information would be
>>>>enough to unset the bit at index seq_h
>>>>and then copy the unboxed tuple components 7 and 8 up by two words:
>>>>
>>>>caller_of_f_cont_info
>>>>unboxed tuple component 7
>>>>unboxed tuple component 8 <- Sp
>>>>
>>>>Then jump to caller_of_f_cont_info, which knows what to make of the
>>>>stack and the register state.
>>>>
>>>>The stack scanning is horrible and too brittle and slow for a
>>>>production setting, as any of the unboxed tuple components could
>>>>have the same bit pattern as `unset_bit_stk_info`.
>>>>We simply hope that is never the case and it's fine for the purposes
>>>>of a quick-and-dirty instrumentation.
>>>>
>>>>QUESTION 1: What do you think? Could this work? Can you anticipate
>>>>pit falls of this approach?
>>>>
>>>># What about Thunks and StgRhsCon?
>>>>
>>>>QUESTION 2: The instrumentation as described won't work for Thunks
>>>>(which are only entered once) and constructor applications (like
>>>>sat_sBK in the BigNat matcher above). Not sure how to change that
>>>>without incurring huge performance hits (by deactivating
>>>>memoisation). Ideas are welcome here.
>>>>
>>>>Thanks for reading this far. I hope you could follow and are now
>>>>fizzing with excitement because you have a much better idea of how
>>>>to do this and will let me know :)
>>>>
>>>>Sebastian
>>>>_______________________________________________
>>>>ghc-devs mailing list
>>>>ghc-devs at haskell.org
>>>>http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20211217/e68218f0/attachment-0001.html>
More information about the ghc-devs
mailing list