[GHC] #15449: Nondeterministic Failure on aarch64 with -jn, n > 1

GHC ghc-devs at haskell.org
Thu Feb 7 23:51:16 UTC 2019


#15449: Nondeterministic Failure on aarch64 with -jn, n > 1
-------------------------------------+-------------------------------------
        Reporter:  tmobile           |                Owner:  tmobile
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.10.1
       Component:  Compiler          |              Version:  8.4.3
      Resolution:                    |             Keywords:
Operating System:  Linux             |         Architecture:  aarch64
 Type of failure:  Compile-time      |            Test Case:
  crash or panic                     |
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by tmobile):

 Sorry for my slowness on this; I've been busy with other things at work
 and we have yet to actually trigger this bug with out code on aarch64 for
 some reason, so I haven't had time to take a look.

 Now that I understand a bit better I agree that `stg_BLACKHOLE` is
 unlikely to blame. Ben, I'll give your patch a go, but it seems strange
 that we update the indirectee straight away in the macro; why not do
 something like:

 {{{
 #define updateWithIndirection(p1, p2, and_then) \
     W_ bd;                                                      \
                                                                 \
     OVERWRITING_CLOSURE(p1);                                    \
     SET_INFO(p1, stg_BLACKHOLE_info);                           \
     LDV_RECORD_CREATE(p1);                                      \
     prim_write_barrier;                                         \
     StgInd_indirectee(p1) = p2;                                 \
     bd = Bdescr(p1);                                            \
     if (bdescr_gen_no(bd) != 0 :: bits16) {                     \
       recordMutableCap(p1, TO_W_(bdescr_gen_no(bd)));           \
       TICK_UPD_OLD_IND();                                       \
       and_then;                                                 \
     } else {                                                    \
       TICK_UPD_NEW_IND();                                       \
       and_then;                                                 \
   }
 }}}

 Is it just that we don't care when other HECs see the side effects of
 SET_INFO? It seems to me that doing SET_INFO after the write barrier could
 cause you to race too.

 And as far as SPARC goes, IIRC SPARC machines are actually in TSO mode by
 default, and programs must explicitly switch to RMO or PSO mode. SPARC TSO
 provides essentially the same guarantees as X86.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15449#comment:23>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list