[GHC] #15544: Non-deterministic segmentation fault in cryptohash-sha256 testsuite

GHC ghc-devs at haskell.org
Fri Sep 7 17:00:04 UTC 2018


#15544: Non-deterministic segmentation fault in cryptohash-sha256 testsuite
-------------------------------------+-------------------------------------
        Reporter:  bgamari           |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  highest           |            Milestone:  8.6.1
       Component:  Compiler          |              Version:  8.4.3
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by osa1):

 > @osa1 what makes you suspect the STM fix?

 I'm debugging the assertion failure in comment:12 which looked serious
 enough to me (a TSO list is getting corrupted). I realized that the list
 that's being corrupted is a run queue, and the reason it's being corrupted
 is because in `stmCommitTransaction` we unpark a thread that is already in
 a run queue. So at some point the thread is in two lists (in both a run
 queue and a TRec's wait queue).

 This is the point where we corrupt the list:

 {{{
 We're unpark_tso()'ing a thread that is already in a run queue.

 352     if (tso->block_info.closure != &stg_STM_AWOKEN_closure) {
 353         // safe to do a non-atomic test-and-set here, because it's
 354         // fine if we do multiple tryWakeupThread()s.
 355         tso->block_info.closure = &stg_STM_AWOKEN_closure;
 356         tryWakeupThread(cap,tso);
 357     }

 Old value = (StgTSO *) 0x104df58
 New value = (StgTSO *) 0x42001d9000
 0x0000000000dcb2b3 in unpark_tso (cap=0x104f6c0 <MainCapability>,
 tso=0x42001d9078) at rts/STM.c:355
 355             tso->block_info.closure = &stg_STM_AWOKEN_closure;
 >>> bt
 #0  0x0000000000dcb2b3 in unpark_tso (cap=0x104f6c0 <MainCapability>,
 tso=0x42001d9078) at rts/STM.c:355
 #1  0x0000000000dcb35c in unpark_waiters_on (cap=0x104f6c0
 <MainCapability>, s=0x42001c2070) at rts/STM.c:374
 #2  0x0000000000dcd2d2 in stmCommitTransaction (cap=0x104f6c0
 <MainCapability>, trec=0x4200037c50) at rts/STM.c:1092
 #3  0x0000000000dee080 in stg_atomically_frame_info ()
 #4  0x0000000000000000 in ?? ()
 }}}

 (note that this is reverse execution so "Old value" is actually the new
 value)

 The thread is already in a run queue:

 {{{
 >>> print tso
 $23 = (StgTSO *) 0x42001d9078

 >>> print MainCapability->run_queue_hd->_link->_link
 $25 = (struct StgTSO_ *) 0x42001d9078
 }}}

 At this point the TSO link is fine:

 {{{
 >>> print MainCapability->run_queue_hd->_link->_link->block_info.prev ==
 MainCapability->run_queue_hd->_link
 $29 = 1
 }}}

 Because the STM fix changed `unpark_tso()` I thought it may be related. I
 don't yet know how this thread ends up in two lists, I'll investigate
 further.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15544#comment:17>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list