Proposal: Use uninterruptibleMask for cleanup actions in Control.Exception

Thu Oct 2 20:25:27 UTC 2014

On 24/09/2014 17:58, Merijn Verstraaten wrote:

> Well, another issue that was raised (I forget by whom!) was the fact
> that stackoverflows are currently thrown externally from the
> executing thread and that this should really be changed into being
> thrown to the thread by itself, thereby avoiding the mask.

Yeah, a stack overflow is an asynchronous exception.  Tt is not thrown 
by any particular thread, but it is treated as an async exception from 
the point of view of the receiving thread.  And this is the right thing 
to do, since a stack overflow can occur absolutely anywhere, so it has 
exactly the characteristics of an async exception.

So it is already the case that a stack overflow inside mask does not 
cause an exception.  Instead the exception is deferred until the thread 
exits the mask.  This proposal wouldn't change anything in that respect.

Cheers,
Simon

>
> Cheers, Merijn
>
> On 24 Sep 2014, at 05:51 , Simon Marlow <marlowsd at gmail.com> wrote:
>> Ok, sorry for the delay, we still need a resolution on this one.
>>
>> So thanks to your persuasive comments I think I'm convinced.  What
>> finally tipped me over the edge was this:
>>
>> https://phabricator.haskell.org/diffusion/GHC/browse/master/libraries/base/Control/Concurrent/QSem.hs;165072b334ebb2ccbef38a963ac4d126f1e08c96$103-112
>>
>>
>>
>>
It turns out I've been a victim of this "bug" myself :-)  So let's fix it.
>>
>> But what is the cost? Adding an uninterruptibleMask won't be free.
>>
>> In the case of `catch`, since the mask is already built in to the
>> primitive, we can just change it to be an uninterruptibleMask, and
>> that applies to handle and onException too.  For `finally` we can
>> replace the mask with an uninterruptibleMask, but for `bracket` we
>> have to add a new layer of uninterruptibleMask.
>>
>> Lots of documentation probably needs to be updated.  Any chance
>> you could make a patch and upload it to Phabricator?
>>
>> Cheers, Simon
>>
>> On 05/09/2014 18:34, Eyal Lotem wrote:
>>> Hey Simon, thanks for the reply!
>>>
>>>
>>> On Fri, Sep 5, 2014 at 6:39 PM, Simon Marlow <marlowsd at gmail.com
>>>  <mailto:marlowsd at gmail.com>> wrote:
>>>
>>> Eyal, thanks for bringing up this issue.  It's been at the back
>>> of my mind for a while, but I've never really thought through the
>>> issues and consequences of changes.  So this is a good
>>> opportunity to do that.  You point out (in another email in the
>>> thread) that:
>>>
>>> A) Cases that were not interruptible will remain the same. B)
>>> Cases that were interruptible were bugs and will be fixed.
>>>
>>> However,
>>>
>>> C) Some bugs will turn into deadlocks (unkillable threads)
>>>
>>> Being able to recover from bugs is an important property in large
>>> long-running systems.  So this is a serious problem.  Hence why I
>>> always treat uninterruptibleMask with the deepest suspicion.
>>>
>>>
>>> Recovering from various kinds of failures makes a lot of sense.
>>> But how can you recover from arbitrary invariants of the program
>>> being broken?
>>>
>>> For example, if you use a bracket on some semaphore monitoring a
>>> global resource. How do you recover from a bug of leaking
>>> semaphore tokens?
>>>
>>> Recovering from crashes of whole processes whose internal state
>>> can be recovered to a fresh, usable state, is a great feature.
>>> Recovering from thread crashes that share arbitrary mutable
>>> state with other threads is not practical, I believe.
>>>
>>> Let's consider the case where we have an interruptible operation
>>> in the handler, and divide it into two (er three):
>>>
>>> 1. it blocks for a short bounded amount of time. 2. It blocks
>>> for a long time 3. It blocks indefinitely
>>>
>>> These are all buggy, but in different ways.  Only (1) is fixed by
>>> adding uninterruptibleMask.  (2) is "fixed", but in exchange for
>>> an unresponsive thread - also undesirable.  (3) was a bug in the
>>> application code, and turns into a deadlock with
>>> uninterruptibleMask, which is undesirable.
>>>
>>>
>>> I think that (1) is by far the most common and is very
>>> prevalent. I think 0-time interruptible (that can block but
>>> almost never do) operations are the most common cleanup
>>> handlers.
>>>
>>> For (2) and (3), we need to choose the lesser evil:
>>>
>>> A) Deadlocks and/or unresponsiveness B) Arbitrary invariants
>>> being broken and leaks
>>>
>>> In my experience, A tends to manifest exactly where the bug is,
>>> and is therefore easy to debug and mostly a "performance bug" .
>>> B tends to manifest as difficult to explain behavior elsewhere
>>> from where the bug actually is, and is usually a "correctness
>>> bug", which is almost always worse.
>>>
>>> Therefore, I think A is a far far lesser evil than B, when (2)
>>> and (3) are involved.
>>>
>>> I'd like to reemphasize that this change will almost always fix
>>> the problem completely since the most common case is (1), and in
>>> rare cases, it will convert B to A, which is also, IMO, very
>>> desirable.
>>>
>>>
>>> This is as far as I've got thinking through the issues so far. I
>>> wonder to what extent the programmer can and should mitigate
>>> these cases, and how much we can help them.  I don't want
>>> unkillable threads, even when caused by buggy code.
>>>
>>>
>>> Cheers, Simon
>>>
>>>
>>> On 04/09/2014 16:46, Roman Cheplyaka wrote:
>>>
>>> I find your arguments quite convincing. Count that as +1 from
>>> me.
>>>
>>> Roman
>>>
>>>
>>>
>>> _________________________________________________ Libraries
>>> mailing list Libraries at haskell.org <mailto:Libraries at haskell.org>
>>>  http://www.haskell.org/__mailman/listinfo/libraries
>>> <http://www.haskell.org/mailman/listinfo/libraries>
>>>
>>>
>>>
>>>
>>> -- Eyal
>> _______________________________________________ Libraries mailing
>> list Libraries at haskell.org
>> http://www.haskell.org/mailman/listinfo/libraries
>