[GHC] #15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks
GHC
ghc-devs at haskell.org
Mon Aug 27 10:02:56 UTC 2018
#15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks
-------------------------------------+-------------------------------------
Reporter: osa1 | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone: 8.6.1
Component: Runtime | Version: 8.5
System |
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: None/Unknown
Unknown/Multiple |
Test Case: | Blocked By:
Blocking: | Related Tickets: #15508
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
While debugging #15508 I found a case where eager blackholing in AP_STACK
causes `closure_sizeW()` to return incorrect size, which in turn causes
incorrect slop zeroing by `OVERWRITING_CLOSURE()`, which breaks sanity
checks.
To reproduce, cd into `testsuite/tests/concurrent/prog001`, then:
{{{
$ ghc-stage2 Mult.hs -fforce-recomp -debug -rtsopts
$ ./Mult +RTS -DS
Mult: internal error: checkClosure: stack frame
(GHC version 8.7.20180825 for x86_64_unknown_linux)
Please report this as a GHC bug:
http://www.haskell.org/ghc/reportabug
zsh: abort (core dumped) ./Mult +RTS -DS
}}}
Here's how the problem occurs:
1. Allocate an AP_STACK in a generation during a GC.
2. Evaluate the AP_STACK. The entry code first WHITEHOLEs and then eagerly
BLACKHOLEs it. At this point size of the STACK becomes 2 because that's
the
size of (eager or not) BLACKHOLE.
3. To start a GC the thread does `threadPaused`, which in line 342
actually
BLACKHOLEs the eager blackhole (is this part really correct?) and zeros
the
slop, but because the eager blackhole has the same size as BLACKHOLE it
doesn't actually zero the stack frames in the original AP_STACK's
payload.
4. In the next GC, in pre-GC sanity check we check the whole heap. When
checking the generation that the BLACKHOLE (the AP_STACK that became a
BLACKHOLE in step (2)) resides in we check the closure, and then check
`closure + 2` (2 is the size of BLACKHOLE) instead of `closure + <size
of the stack>`, and end up checking a stack frame of the original
AP_STACK.
This causes the sanity check to fail because we don't expect to see a
stack
frame outside of a stack.
In summary, normally when blackhole an object we zero the space after the
blackhole (i.e. some part of the original object's payload) so that in
sanity
checks we can skip over that space, but we can't do this when eagerly
blackholing (because the payload of the original object will be used)
which
causes sanity check failures.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15571>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list