Feasibility of native RTS support for continuations?

Sat Feb 1 00:23:19 UTC 2020

> On Jan 30, 2020, at 02:35, Simon Marlow <marlowsd at gmail.com> wrote:
> 
> My guess is you can almost do what you want with asynchronous exceptions but some changes to the RTS would be needed.
> 
> There's a bit of code in the IO library that literally looks like this (https://gitlab.haskell.org/ghc/ghc/blob/master/libraries%2Fbase%2FGHC%2FIO%2FHandle%2FInternals.hs#L175)

Thanks for the pointer, I definitely had not discovered that! That is an interesting trick. I think your explanation paired with the Note is enough for it to make sense to me, though I don’t yet understand all the implementation subtleties of raiseAsync itself.

> Also you might want to optimise the implementation so that it doesn't actually tear down the stack as it copies it into the heap, so that you could avoid the need to copy it back from the heap again in shift#.

I don’t fully understand this point — do you mean “avoid the need to copy it back on continuation application”? shift# just copies the stack slice to the heap, so it doesn’t need to copy it back.

If I was right, and you were referring to continuation application rather than shift#, I agree with you there. It seems as though it ought to be possible to represent the stack slice as a stack itself, so if the stack looks like

    ┌───────────┐
    │ RET_SMALL │
    ├───────────┤
    │ CATCH     │
    ├───────────┤
    │ RESET     │
    ├───────────┤

then the captured continuation could itself essentially be a stack like

    ┌───────────┐
    │ RET_SMALL │
    ├───────────┤
    │ CATCH     │
    ├───────────┤
    │ UNDERFLOW │
    └───────────┘

where restoring a continuation just copies the captured stack and updates its underflow frame to point at the top of the current stack. If the caller promises not to use the continuation again after applying it, the copying could be skipped entirely, and the captured stack could just become the new stack.

However, I don’t understand enough about the way the RTS currently works to know if this is viable. For example, what if I write this:

    reset (mask_ (shift f))

Now presumably I want to ensure the masking state is restored when I invoke the continuation. But it can’t just be unconditionally restored, since if I write

    mask_ (reset (shift f >>= g))

then the mask frame isn’t included on the stack, so applying the continuation shouldn’t affect the masking state. Presumably this means a continuation restore can’t be as simple as copying the captured stack frames onto the current stack, since restoring certain frames affects other parts of the RTS state.

(Tangentially, it seems like this particular example is not handled properly in the existing capture/restore code, as this comment in Exception.cmm notes:

     NB. there's a bug in here.  If a thread is inside an
     unsafePerformIO, and inside maskAsyncExceptions# (there is an
     unmaskAsyncExceptions_ret on the stack), and it is blocked in an
     interruptible operation, and it receives an exception, then the
     unsafePerformIO thunk will be updated with a stack object
     containing the unmaskAsyncExceptions_ret frame.  Later, when
     someone else evaluates this thunk, the original masking state is
     not restored.

I think getting this right probably matters more if continuations are added, so that’s something else to worry about.)

> So that's shift#. What about reset#? I expect it's something like `unsafeInterleaveIO`, that is it creates a thunk to name the continuation. You probably also want a `catch` in there, so that we don't tear down more of the stack than we need to.

It would be nice to be able to capture slices of the stack that include catch frames, though theoretically it isn’t necessary — it would be possible to arrange for a continuation that wants to capture through a catch to do something like

    reset (... (catch (reset ...) ...))

instead, then call shift twice and compose the two continuations by hand (inserting another call to catch in between). I don’t know enough yet to understand all the tradeoffs involved, but I’ll see if it’s possible to get away with the userland emulation, since I figure the less code in the RTS the better!

> Hope this is helpful.

Very much so, thank you!

> On Jan 30, 2020, at 10:31, Ben Gamari <ben at smart-cactus.org> wrote:
> 
> For the record, runtime system captures the stack state in an AP_STACK
> closure. This is done in rts/RaiseAsync.c:raiseAsync and some of this is
> described in the comment attached to that function.
> 
> As Simon PJ points out, this is all very tricky stuff, especially in a
> concurrent context. If you make any changes in this area do be sure to
> keep in mind the considerations described in Note [AP_STACKs must be
> eagerly blackholed], which arose out of the very-nast #13615.

Thank you for the pointers — I did manage to find raiseAsync, but I hadn’t seen that Note. I will try my best to be suitably wary, though I imagine I’m in far enough over my head that I don’t yet know half the things I need to be wary of. :)

Alexis