Feasibility of native RTS support for continuations?

Tue Jan 28 08:19:40 UTC 2020

Hi all,

tl;dr: I want to try to implement native support for capturing slices of
the RTS stack as a personal experiment; please tell me the obstacles I
am likely to run into. Much more context follows.

---

I have been working on an implementation of an algebraic effect system
that uses unsafe primops to be as performant as possible. However, the
unavoidable need to CPS the entire program balloons heap allocation.
Consider the following definition:

    f a b = g a >>= \c -> h (b + c)

Assume `g` and `h` are not inlined. If the monad used is IO, this will
be compiled efficiently: the result of `g` is returned on the stack, and
no closure needs to be allocated for the lambda. However, if the monad
supports capturing the continuation, the above definition must be CPS’d.
After inlining, we end up with

    f a b = \k -> let lvl = \c -> h (b + c) k in g a lvl

which must allocate a closure on the heap. This is frustrating, as it
must happen for every call to a non-inlined monadic operation, even if
that operation never captures the continuation. In an algebraic effect
system, there are many shortcuts that avoid the need to capture the
continuation, and under my implementation, large swaths of code never do
so. I’ve managed to exploit that to get some savings, but I can’t escape
the need to allocate all these closures.

This motivates my question: how difficult would it be to allow capturing
a portion of the RTS call stack directly? My requirements are fairly
minimal, as continuations go:

  1. Capturing a continuation is only legal from within a strict state
     thread (i.e. IO or strict ST).

  2. The continuation is captured up to a prompt, which would be a new
     kind of RTS stack frame. Prompts are not tagged, so there is only
     ever exactly one prompt active at any time (which may be the root
     prompt).

  3. Capturing a continuation is unsafe. The behavior of capturing a
     continuation is undefined if the current prompt was not created by
     the current state thread (and it is never legal to capture up to
     the root prompt).

  4. Applying a continuation is unsafe. Captured continuations return
     `Any`, and type safety is the caller’s obligation.

  5. Continuations are “functional,” which is to say applying them does
     not trigger any additional stack unwinding.

This minimal support for primitive continuation capturing would be
enough to support my efficient, safe delimited control implementation.
In my ignorant mind, implementing this ought to be as simple as defining
two new primops,

    reset# :: (State# s -> (# State# s, a #))
           -> State# s -> (# State# s, a #)

    shift# :: ((a -> State# s -> (# State# s, Any #))
               -> State# s -> (# State# s, Any #))
           -> State# s -> (# State# s, a #)

where reset# pushes a new prompt frame and shift# captures a slice of
the RTS stack up to that frame and copies it into the heap. Restoring a
continuation would copy all the captured frames onto the end of the
current stack. Sounds simple enough!

I would like to experiment with implementing something like this myself,
just to see if it would really work, but somehow I doubt it is actually
as simple as it sounds. Minor complications are fine, but what worries
me are major obstacles I haven’t found in my limited attempts to learn
about the RTS.

So far, I’ve read the old “The New GHC/Hugs Runtime System” paper, which
still seems mostly accurate from a high level, though I imagine many
details have changed since then. I’ve also read the
Commentary/RTS/Storage/Stack wiki page, and I’ve peeked at some of the
RTS source code. I’ve also skimmed a handful of other papers and wiki
pages, but it’s hard to know what I’m looking for. Can anyone point me
to better resources or give me some insight into what will be hard?

Thanks in advance,
Alexis