[GHC] #15449: Nondeterministic Failure on aarch64 with -jn, n > 1

GHC ghc-devs at haskell.org
Wed Jan 16 19:47:40 UTC 2019


#15449: Nondeterministic Failure on aarch64 with -jn, n > 1
-------------------------------------+-------------------------------------
        Reporter:  tmobile           |                Owner:  tmobile
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.10.1
       Component:  Compiler          |              Version:  8.4.3
      Resolution:                    |             Keywords:
Operating System:  Linux             |         Architecture:  aarch64
 Type of failure:  Compile-time      |            Test Case:
  crash or panic                     |
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by bgamari):

 * cc: simonmar (added)


Comment:

 > I find a pointer to a closure, at the other end I might find:
 >
 > * a value, I'm done.
 > * a closure, I evaluate that.
 > * a blackhole, another HEC has beat me to this closure, so I'll wait for
 them to finish.

 That pretty much sums it up. You can find more on blacholing and its
 consequences for multicore support in "Runtime Support for Multicore
 Haskell."

 >  Perhaps there's simply a barrier missing from std_blackhole.

 I think it's more likely that the missing barrier is elsewhere. The
 `stg_BLACKHOLE` entry code  contains the following loop:
 {{{#!c
     p = StgInd_indirectee(node);
     if (GETTAG(p) != 0) {
         return (p);
     }

     info = StgHeader_info(p);
     if (info == stg_IND_info) {
         // This could happen, if e.g. we got a BLOCKING_QUEUE that has
         // just been replaced with an IND by another thread in
         // wakeBlockingQueue().
         // See Note [BLACKHOLE pointing to IND] in sm/Evac.c
         goto retry;
     }
 }}}
 Note how if the indirectee is tagged we return it immediately.
 Consequently there is the potential for a race if a thunk update is
 missing a barrier since the thread entering the blackhole could see the
 pointer to `StgInd_indirectee(node)` before the closure at that location
 becomes visible.

 `stg_upd_frame` relies on the `updateWithIndirection` macro to perform the
 thunk update. Intriguingly, there doesn't appear to be any barrier between
 the writes initializing the result closure from the thunk computation and
 the update of the indirectee. Rather, there is only a write barrier
 **after** the indirectee update. This seems wrong.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15449#comment:21>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list