[GHC] #15544: Non-deterministic segmentation fault in cryptohash-sha256 testsuite
GHC
ghc-devs at haskell.org
Fri Sep 7 17:00:04 UTC 2018
#15544: Non-deterministic segmentation fault in cryptohash-sha256 testsuite
-------------------------------------+-------------------------------------
Reporter: bgamari | Owner: (none)
Type: bug | Status: new
Priority: highest | Milestone: 8.6.1
Component: Compiler | Version: 8.4.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by osa1):
> @osa1 what makes you suspect the STM fix?
I'm debugging the assertion failure in comment:12 which looked serious
enough to me (a TSO list is getting corrupted). I realized that the list
that's being corrupted is a run queue, and the reason it's being corrupted
is because in `stmCommitTransaction` we unpark a thread that is already in
a run queue. So at some point the thread is in two lists (in both a run
queue and a TRec's wait queue).
This is the point where we corrupt the list:
{{{
We're unpark_tso()'ing a thread that is already in a run queue.
352 if (tso->block_info.closure != &stg_STM_AWOKEN_closure) {
353 // safe to do a non-atomic test-and-set here, because it's
354 // fine if we do multiple tryWakeupThread()s.
355 tso->block_info.closure = &stg_STM_AWOKEN_closure;
356 tryWakeupThread(cap,tso);
357 }
Old value = (StgTSO *) 0x104df58
New value = (StgTSO *) 0x42001d9000
0x0000000000dcb2b3 in unpark_tso (cap=0x104f6c0 <MainCapability>,
tso=0x42001d9078) at rts/STM.c:355
355 tso->block_info.closure = &stg_STM_AWOKEN_closure;
>>> bt
#0 0x0000000000dcb2b3 in unpark_tso (cap=0x104f6c0 <MainCapability>,
tso=0x42001d9078) at rts/STM.c:355
#1 0x0000000000dcb35c in unpark_waiters_on (cap=0x104f6c0
<MainCapability>, s=0x42001c2070) at rts/STM.c:374
#2 0x0000000000dcd2d2 in stmCommitTransaction (cap=0x104f6c0
<MainCapability>, trec=0x4200037c50) at rts/STM.c:1092
#3 0x0000000000dee080 in stg_atomically_frame_info ()
#4 0x0000000000000000 in ?? ()
}}}
(note that this is reverse execution so "Old value" is actually the new
value)
The thread is already in a run queue:
{{{
>>> print tso
$23 = (StgTSO *) 0x42001d9078
>>> print MainCapability->run_queue_hd->_link->_link
$25 = (struct StgTSO_ *) 0x42001d9078
}}}
At this point the TSO link is fine:
{{{
>>> print MainCapability->run_queue_hd->_link->_link->block_info.prev ==
MainCapability->run_queue_hd->_link
$29 = 1
}}}
Because the STM fix changed `unpark_tso()` I thought it may be related. I
don't yet know how this thread ends up in two lists, I'll investigate
further.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15544#comment:17>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list