Feasibility of native RTS support for continuations?

Tue Jan 28 10:09:20 UTC 2020

Alexis

I've thought about this quite a bit in the past, but got stalled for lack of cycles to think about it more.  But there's a paper or two:

https://www.microsoft.com/en-us/research/publication/composable-scheduler-activations-haskell/

On that link you can also see a link to an earlier, shorter, conference version (rejected 😊).

Also this earlier (2007) work

https://www.microsoft.com/en-us/research/publication/lightweight-concurrency-primitives-for-ghc/

On the effects front I think Daan Leijen is doing interesting stuff, although I'm not very up to date:
https://www.microsoft.com/en-us/research/people/daan/publications/

One interesting dimension is whether or not the continuations you capture are one-shot.  If so, particularly efficient implementations are possible.

Also: much of the "capture stack chunk" stuff is *already* implemented, because it is (I think) what happens when a thread receives an asynchronous exception, and just abandon its evaluation of thunks that it has started work on.

Simon

|  -----Original Message-----
|  From: ghc-devs <ghc-devs-bounces at haskell.org> On Behalf Of Alexis King
|  Sent: 28 January 2020 08:20
|  To: ghc-devs <ghc-devs at haskell.org>
|  Subject: Feasibility of native RTS support for continuations?
|  
|  Hi all,
|  
|  tl;dr: I want to try to implement native support for capturing slices of
|  the RTS stack as a personal experiment; please tell me the obstacles I
|  am likely to run into. Much more context follows.
|  
|  ---
|  
|  I have been working on an implementation of an algebraic effect system
|  that uses unsafe primops to be as performant as possible. However, the
|  unavoidable need to CPS the entire program balloons heap allocation.
|  Consider the following definition:
|  
|      f a b = g a >>= \c -> h (b + c)
|  
|  Assume `g` and `h` are not inlined. If the monad used is IO, this will
|  be compiled efficiently: the result of `g` is returned on the stack, and
|  no closure needs to be allocated for the lambda. However, if the monad
|  supports capturing the continuation, the above definition must be CPS’d.
|  After inlining, we end up with
|  
|      f a b = \k -> let lvl = \c -> h (b + c) k in g a lvl
|  
|  which must allocate a closure on the heap. This is frustrating, as it
|  must happen for every call to a non-inlined monadic operation, even if
|  that operation never captures the continuation. In an algebraic effect
|  system, there are many shortcuts that avoid the need to capture the
|  continuation, and under my implementation, large swaths of code never do
|  so. I’ve managed to exploit that to get some savings, but I can’t escape
|  the need to allocate all these closures.
|  
|  This motivates my question: how difficult would it be to allow capturing
|  a portion of the RTS call stack directly? My requirements are fairly
|  minimal, as continuations go:
|  
|    1. Capturing a continuation is only legal from within a strict state
|       thread (i.e. IO or strict ST).
|  
|    2. The continuation is captured up to a prompt, which would be a new
|       kind of RTS stack frame. Prompts are not tagged, so there is only
|       ever exactly one prompt active at any time (which may be the root
|       prompt).
|  
|    3. Capturing a continuation is unsafe. The behavior of capturing a
|       continuation is undefined if the current prompt was not created by
|       the current state thread (and it is never legal to capture up to
|       the root prompt).
|  
|    4. Applying a continuation is unsafe. Captured continuations return
|       `Any`, and type safety is the caller’s obligation.
|  
|    5. Continuations are “functional,” which is to say applying them does
|       not trigger any additional stack unwinding.
|  
|  This minimal support for primitive continuation capturing would be
|  enough to support my efficient, safe delimited control implementation.
|  In my ignorant mind, implementing this ought to be as simple as defining
|  two new primops,
|  
|      reset# :: (State# s -> (# State# s, a #))
|             -> State# s -> (# State# s, a #)
|  
|      shift# :: ((a -> State# s -> (# State# s, Any #))
|                 -> State# s -> (# State# s, Any #))
|             -> State# s -> (# State# s, a #)
|  
|  where reset# pushes a new prompt frame and shift# captures a slice of
|  the RTS stack up to that frame and copies it into the heap. Restoring a
|  continuation would copy all the captured frames onto the end of the
|  current stack. Sounds simple enough!
|  
|  I would like to experiment with implementing something like this myself,
|  just to see if it would really work, but somehow I doubt it is actually
|  as simple as it sounds. Minor complications are fine, but what worries
|  me are major obstacles I haven’t found in my limited attempts to learn
|  about the RTS.
|  
|  So far, I’ve read the old “The New GHC/Hugs Runtime System” paper, which
|  still seems mostly accurate from a high level, though I imagine many
|  details have changed since then. I’ve also read the
|  Commentary/RTS/Storage/Stack wiki page, and I’ve peeked at some of the
|  RTS source code. I’ve also skimmed a handful of other papers and wiki
|  pages, but it’s hard to know what I’m looking for. Can anyone point me
|  to better resources or give me some insight into what will be hard?
|  
|  Thanks in advance,
|  Alexis
|  _______________________________________________
|  ghc-devs mailing list
|  ghc-devs at haskell.org
|  https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask
|  ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
|  devs&data=02%7C01%7Csimonpj%40microsoft.com%7C4ee2fda49b5346e0b2b508d7
|  a3cadd55%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637157964004023983&a
|  mp;sdata=3iyrmA6Ie5yLSkZkBz3Wcj5LS7GvE3zwivDEcxv2Y%2B4%3D&reserved=0