[GHC] #15136: High CPU when asynchronous exception and unblocking retry on TVar raced
GHC
ghc-devs at haskell.org
Mon Jun 11 14:09:38 UTC 2018
#15136: High CPU when asynchronous exception and unblocking retry on TVar raced
-------------------------------------+-------------------------------------
Reporter: nshimaza | Owner: (none)
Type: bug | Status: new
Priority: highest | Milestone: 8.6.1
Component: Runtime System | Version: 8.4.2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: Runtime crash | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by carlostome):
After some investigation I found that indeed there is a deadlock between
the mutexes of a TSO and a TVar. An excerpt of the debugging log is the
follwing:
{{{
Task 2: cap 1: running thread 7012 (ThreadRunGHC)
Task 2: STM: 0xad6d08 : stmStartTransaction with 316 tokens
Task 2: STM: 0xad6d08 : stmStartTransaction()=0x4200224000
Task 2: STM: 0x4200224000 : stmWriteTVar(0x4200225c68, 0xad5e72)
Task 2: STM: 0x4200224000 : get_entry_for TVar 0x4200225c68
Task 2: STM: 0x4200224000 : FOR_EACH_ENTRY, current_chunk=0x420027e4d0
limit=0
Task 2: STM: 0x4200224000 : read_current_value(0x4200225c68)=0xad5e69
Task 2: STM: 0x4200224000 : stmWriteTVar done
Task 2: STM: 0x4200224000 : stmCommitTransaction()
Task 2: STM: 0x4200224000 : lock_stm()
Task 2: STM: 0x4200224000 : FOR_EACH_ENTRY, current_chunk=0x420027e4d0
limit=1
Task 2: STM: 0x4200224000 : trying to acquire 0x4200225c68
Task 2: STM: 0x4200224000 : cond_lock_tvar(0x4200225c68, 0xad5e69)
-- (1)
Task 2: STM: 0x4200224000 : success
Task 2: STM: 0x4200224000 : doing read check
Task 2: STM: 0x4200224000 : FOR_EACH_ENTRY, current_chunk=0x420027e4d0
limit=1
Task 2: STM: 0x4200224000 : read-check succeeded
Task 2: STM: 0x4200224000 : FOR_EACH_ENTRY, current_chunk=0x420027e4d0
limit=1
Task 2: STM: 0x4200224000 : writing 0xad5e72 to 0x4200225c68, waking
waiters -- (2)
Task 2: STM: unpark_waiters_on tvar=0x4200225c68
Task 2: STM: unpark_waiters_on tvar=0x4200225c68, status=0x4200223b80
Task 2: STM: cap 1: unpark_waiters_on tvar=0x4200225c68,
status2=0x4200223b80 -- (3)
Task 2: STM: cap 1: unpark_tso id 6866
-- (4)
Task 1: BlockedOnSTM: lockTSO id 6866
Task 1: cap 0 BlockedOnSTM: raiseAsync id 6866
Task 1: cap 0: raising exception in thread 6866.
Task 1: cap 0: raiseAsync: freezing atomically frame
-- (5)
Task 1: STM: 0x4200225f58 : stmAbortTransaction
-- (6)
Task 1: STM: 0x4200225f58 : lock_stm()
Task 1: STM: 0x4200225f58 : aborting top-level transaction cap 0
Task 1: STM: 0x4200225f58 : aborting top-level transaction
Task 1: STM: 0x4200225f58 : stmAbortTransaction aborting waiting
transaction
Task 1: STM: 0x4200225f58 : remove_watch_queue_entries_for_trec()
-- (7)
Task 1: STM: 0x4200225f58 : FOR_EACH_ENTRY, current_chunk=0x4200223cc8
limit=1
Task 1: STM: 0x4200225f58 : lock_tvar (0x4200225c68)
-- (8)
}}}
What is happening here has to do with how async exceptions are
implemented.
The thread 6866 is waiting on the TVar 0x4200225c68. In the meantime the
thread 7012 is able to change the value of such TVar to True. To do so,
first it has to acquire the lock for the TVar (1), then actually change
its value (2), and finally wake any thread that was waiting on this TVar
(3). To wake up a concrete waiter it needs to hold its mutex through
lock_tso() (4).
The problem is that when an async exception is thrown to a thread whose
status is BlockedOnSTM, the first thing the thread throwing the exception
does is to grab the mutex of the target thread with lock_tso()
(https://github.com/ghc/ghc/blob/93220d46fceabf3afeae36f1fda94e1698c3639a/rts/RaiseAsync.c#L419),
then because the target thread was in the middle of an STM transaction
(5) that dependes on the same TVar, it has to abort it (6) for what it has
to be removed from the TVar watch queue (7) needing the mutex on the TVar
to do so (8).
Basically the thread completing the STM grabs first the TVar lock and then
the TSO lock while for the async exception the locks are taken in reverse
order.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15136#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list