[GHC] #15449: Nondeterministic Failure on aarch64 with -jn, n > 1
GHC
ghc-devs at haskell.org
Wed Jan 16 19:47:40 UTC 2019
#15449: Nondeterministic Failure on aarch64 with -jn, n > 1
-------------------------------------+-------------------------------------
Reporter: tmobile | Owner: tmobile
Type: bug | Status: new
Priority: normal | Milestone: 8.10.1
Component: Compiler | Version: 8.4.3
Resolution: | Keywords:
Operating System: Linux | Architecture: aarch64
Type of failure: Compile-time | Test Case:
crash or panic |
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Changes (by bgamari):
* cc: simonmar (added)
Comment:
> I find a pointer to a closure, at the other end I might find:
>
> * a value, I'm done.
> * a closure, I evaluate that.
> * a blackhole, another HEC has beat me to this closure, so I'll wait for
them to finish.
That pretty much sums it up. You can find more on blacholing and its
consequences for multicore support in "Runtime Support for Multicore
Haskell."
> Perhaps there's simply a barrier missing from std_blackhole.
I think it's more likely that the missing barrier is elsewhere. The
`stg_BLACKHOLE` entry code contains the following loop:
{{{#!c
p = StgInd_indirectee(node);
if (GETTAG(p) != 0) {
return (p);
}
info = StgHeader_info(p);
if (info == stg_IND_info) {
// This could happen, if e.g. we got a BLOCKING_QUEUE that has
// just been replaced with an IND by another thread in
// wakeBlockingQueue().
// See Note [BLACKHOLE pointing to IND] in sm/Evac.c
goto retry;
}
}}}
Note how if the indirectee is tagged we return it immediately.
Consequently there is the potential for a race if a thunk update is
missing a barrier since the thread entering the blackhole could see the
pointer to `StgInd_indirectee(node)` before the closure at that location
becomes visible.
`stg_upd_frame` relies on the `updateWithIndirection` macro to perform the
thunk update. Intriguingly, there doesn't appear to be any barrier between
the writes initializing the result closure from the thunk computation and
the update of the indirectee. Rather, there is only a write barrier
**after** the indirectee update. This seems wrong.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15449#comment:21>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list