Proposal: Use uninterruptibleMask for cleanup actions in Control.Exception

Wed Sep 24 12:51:46 UTC 2014

Ok, sorry for the delay, we still need a resolution on this one.

So thanks to your persuasive comments I think I'm convinced.  What 
finally tipped me over the edge was this:

https://phabricator.haskell.org/diffusion/GHC/browse/master/libraries/base/Control/Concurrent/QSem.hs;165072b334ebb2ccbef38a963ac4d126f1e08c96$103-112

It turns out I've been a victim of this "bug" myself :-)  So let's fix it.

But what is the cost? Adding an uninterruptibleMask won't be free.

In the case of `catch`, since the mask is already built in to the 
primitive, we can just change it to be an uninterruptibleMask, and that 
applies to handle and onException too.  For `finally` we can replace the 
mask with an uninterruptibleMask, but for `bracket` we have to add a new 
layer of uninterruptibleMask.

Lots of documentation probably needs to be updated.  Any chance you 
could make a patch and upload it to Phabricator?

Cheers,
Simon

On 05/09/2014 18:34, Eyal Lotem wrote:
> Hey Simon, thanks for the reply!
>
>
> On Fri, Sep 5, 2014 at 6:39 PM, Simon Marlow <marlowsd at gmail.com
> <mailto:marlowsd at gmail.com>> wrote:
>
>     Eyal, thanks for bringing up this issue.  It's been at the back of
>     my mind for a while, but I've never really thought through the
>     issues and consequences of changes.  So this is a good opportunity
>     to do that.  You point out (in another email in the thread) that:
>
>     A) Cases that were not interruptible will remain the same.
>     B) Cases that were interruptible were bugs and will be fixed.
>
>     However,
>
>     C) Some bugs will turn into deadlocks (unkillable threads)
>
>     Being able to recover from bugs is an important property in large
>     long-running systems.  So this is a serious problem.  Hence why I
>     always treat uninterruptibleMask with the deepest suspicion.
>
>
> Recovering from various kinds of failures makes a lot of sense. But how
> can you recover from arbitrary invariants of the program being broken?
>
> For example, if you use a bracket on some semaphore monitoring a global
> resource. How do you recover from a bug of leaking semaphore tokens?
>
> Recovering from crashes of whole processes whose internal state can be
> recovered to a fresh, usable state, is a great feature.
> Recovering from thread crashes that share arbitrary mutable state with
> other threads is not practical, I believe.
>
>     Let's consider the case where we have an interruptible operation in
>     the handler, and divide it into two (er three):
>
>       1. it blocks for a short bounded amount of time.
>       2. It blocks for a long time
>       3. It blocks indefinitely
>
>     These are all buggy, but in different ways.  Only (1) is fixed by
>     adding uninterruptibleMask.  (2) is "fixed", but in exchange for an
>     unresponsive thread - also undesirable.  (3) was a bug in the
>     application code, and turns into a deadlock with
>     uninterruptibleMask, which is undesirable.
>
>
> I think that (1) is by far the most common and is very prevalent. I
> think 0-time interruptible (that can block but almost never do)
> operations are the most common cleanup handlers.
>
> For (2) and (3), we need to choose the lesser evil:
>
> A) Deadlocks and/or unresponsiveness
> B) Arbitrary invariants being broken and leaks
>
> In my experience, A tends to manifest exactly where the bug is, and is
> therefore easy to debug and mostly a "performance bug" .
> B tends to manifest as difficult to explain behavior elsewhere from
> where the bug actually is, and is usually a "correctness bug", which is
> almost always worse.
>
> Therefore, I think A is a far far lesser evil than B, when (2) and (3)
> are involved.
>
> I'd like to reemphasize that this change will almost always fix the
> problem completely since the most common case is (1), and in rare cases,
> it will convert B to A, which is also, IMO, very desirable.
>
>
>     This is as far as I've got thinking through the issues so far.  I
>     wonder to what extent the programmer can and should mitigate these
>     cases, and how much we can help them.  I don't want unkillable
>     threads, even when caused by buggy code.
>
>
>     Cheers,
>     Simon
>
>
>     On 04/09/2014 16:46, Roman Cheplyaka wrote:
>
>         I find your arguments quite convincing. Count that as +1 from me.
>
>         Roman
>
>
>
>         _________________________________________________
>         Libraries mailing list
>         Libraries at haskell.org <mailto:Libraries at haskell.org>
>         http://www.haskell.org/__mailman/listinfo/libraries
>         <http://www.haskell.org/mailman/listinfo/libraries>
>
>
>
>
> --
> Eyal