[GHC] #15136: High CPU when asynchronous exception and unblocking retry on TVar raced

GHC ghc-devs at haskell.org
Thu Jul 5 11:59:56 UTC 2018


#15136: High CPU when asynchronous exception and unblocking retry on TVar raced
-------------------------------------+-------------------------------------
        Reporter:  nshimaza          |                Owner:  osa1
            Type:  bug               |               Status:  new
        Priority:  highest           |            Milestone:  8.6.1
       Component:  Runtime System    |              Version:  8.4.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  Runtime crash     |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by osa1):

 Just to repeat comment:1 and comment:2 with my words:

 Thread 1 kills Thread 2 which is blocked on a TVar operation. For this it
 calls
 raiseAsync() and for that it has to lock Thread 2 (lockTSO). Then to abort
 the
 transaction it needs to lock the TVar (lock_tvar).

 At the same time Thread 3 succeeds to modify the TVar and to unblock any
 threads
 blocked on this TVar it needs to lock the TVar (lock_tvar), and then to
 actually
 unblock the thread it needs to lock the TSO (lockTSO).

 When the order of locking goes like this:

 - Thread 1 locks the TSO (lockTSO)
 - Thread 3 locks the TVar (lock_tvar)

 We get a deadlock because Thread 1 now wants to lock the TVar and Thread 3
 wants
 to lock the TSO, both of which are locked already.

 > Perhaps we should switch to using an owner semantics for BlockedOnSTM
 too -
 > that is, if we see BlockedOnSTM in raiseAsync, we attempt to lock the
 TVar
 > pointed to by tso->block_info.

 I only get `END_TSO_QUEUE` in `tso->block_info`. I think the TVar is only
 reachable from the array list `tso->trec->current_chunk`. I guess we could
 do
 this:

 - Lock the TSO
 - If BlockedOnSTM then check tso->trec entries. Expect to see only one
 TVar
   there (can we have more than on TVars here?). Lock the TVar and release
 the
   TSO.
 - Continue with raiseAsync()

 I don't know if we can see more than one TVar in tso->trec entries. Also,
 we
 need to modify stmAbortTransaction because we'll have the TVar locked
 already,
 but it still needs to lock it when it's called from other call sites (e.g.
 from
 `raise#`).

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15136#comment:9>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list