Feasibility of native RTS support for continuations?

Wed Jan 29 09:32:32 UTC 2020

|  Now that is very interesting, and certainly not something I would have
|  expected! Why would asynchronous exceptions need to capture any portion of
|  the stack? Exceptions obviously trigger stack unwinding, so I assumed the
|  “abort to the current prompt” part of my implementation would already
|  exist, but not the “capture a slice of the stack” part. Could you say a
|  little more about this, or point me to some relevant code?

Suppose a thread happens to be evaluating a pure thunk for (factorial 200). Then it gets an asynchronous exception from another thread.  That asynch exn is nothing to do with (factorial 200). So we could either 

A. revert the thunk to (factorial 200), abandoning all
   the work done so far, or
B. capture the stack and attach it to the thunk, so that ie any other
   thread enters that thunk, it'll just run that stack.

Now (A) means that every thunk has to be revertible, which means keeping its original free variables, which leads to space leaks.  And extra work to avoid losing any info you need for reversion.  Extra work is painful; we want to put all of the extra work on the asynch exn.

So we do (B).

See Section 8 of "Asynchronous exceptions in Haskell".
https://www.microsoft.com/en-us/research/publication/asynchronous-exceptions-haskell-3/

And "An implementation of resumable black holes" (Reid).
https://alastairreid.github.io/papers/IFL_98/

This stack-freezing stuff is definitely implemented.   I'm not quite sure where, but I'm cc'ing Simon Marlow who can point you at it.

------------

You need to be careful.  Suppose a thread pushes a prompt, then later evaluates a thunk T1, which in turn evaluates a thunk T2.  If you capture the stack down to the prompt, you MUST overwrite T1 and T2 with a resumable continuation capturing their portion of the stack, in case some other, unrelated thread needs their value.

But as I say, all this is implemented.

---------------

Keep us posted.  It'd be good to have a design that accommodated some of the applications in the 'composable scheduler activations' paper too.

Simon

|  -----Original Message-----
|  From: Alexis King <lexi.lambda at gmail.com>
|  Sent: 28 January 2020 22:19
|  To: Simon Peyton Jones <simonpj at microsoft.com>
|  Cc: ghc-devs <ghc-devs at haskell.org>
|  Subject: Re: Feasibility of native RTS support for continuations?
|  
|  > On Jan 28, 2020, at 04:09, Simon Peyton Jones <simonpj at microsoft.com>
|  wrote:
|  >
|  > I've thought about this quite a bit in the past, but got stalled for
|  lack of cycles to think about it more.  But there's a paper or two
|  
|  Many thanks! I’d stumbled upon the 2007 paper, but I hadn’t seen the 2016
|  one. In the case of the former, I had thought it probably wasn’t
|  enormously relevant, since the “continuations” appear to be fundamentally
|  one-shot. At first glance, that doesn’t seem to have changed in the JFP
|  article, but I haven’t really read it yet, so maybe I’m mistaken. I’ll
|  take a closer look.
|  
|  > On the effects front I think Daan Leijen is doing interesting stuff,
|  although I'm not very up to date:
|  >
|  https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.micr
|  osoft.com%2Fen-
|  us%2Fresearch%2Fpeople%2Fdaan%2Fpublications%2F&data=02%7C01%7Csimonpj
|  %40microsoft.com%7C1f6ac242e0334d662c8c08d7a4401d95%7C72f988bf86f141af91ab
|  2d7cd011db47%7C1%7C0%7C637158467589648051&sdata=%2BPgJblk6y%2BjXRc5bA0
|  gdzQEjrqgAQB6UYytdw7UtLQQ%3D&reserved=0
|  
|  Indeed, I’ve read a handful of his papers while working on this! I didn’t
|  mention it in the original email, but I’ve also talked a little with
|  Matthew Flatt about efficient implementation of delimited control, and he
|  pointed me to a few papers a couple of months ago. One of those was “Final
|  Shift for call/cc: a Direct Implementation of Shift and Reset” by
|  Gasbichler and Sperber, which describes an approach closest to what I
|  currently have in mind to try to implement in the RTS.
|  
|  > One interesting dimension is whether or not the continuations you
|  capture are one-shot.  If so, particularly efficient implementations are
|  possible.
|  
|  Quite so. One thing I’ve considered is that it’s possible to obtain much
|  of that efficiency even without requiring strict one-shot continuations if
|  you have a separate operation for restoring a continuation that guarantees
|  you won’t ever restore it again, sort of like the existing
|  unsafeThaw/unsafeFreeze operations. That is, you can essentially convert a
|  multi-shot continuation into a one-shot continuation and reap performance
|  benefits, even if you’ve already applied the continuation.
|  
|  This is a micro-optimization, though, so I’m not worrying too much about
|  it right now.
|  
|  > Also: much of the "capture stack chunk" stuff is *already* implemented,
|  because it is (I think) what happens when a thread receives an
|  asynchronous exception, and just abandon its evaluation of thunks that it
|  has started work on.
|  
|  Now that is very interesting, and certainly not something I would have
|  expected! Why would asynchronous exceptions need to capture any portion of
|  the stack? Exceptions obviously trigger stack unwinding, so I assumed the
|  “abort to the current prompt” part of my implementation would already
|  exist, but not the “capture a slice of the stack” part. Could you say a
|  little more about this, or point me to some relevant code?
|  
|  Thanks again,
|  Alexis