[GHC] #15449: Nondeterministic Failure on aarch64 with -jn, n > 1

GHC ghc-devs at haskell.org
Tue Nov 6 03:59:22 UTC 2018


#15449: Nondeterministic Failure on aarch64 with -jn, n > 1
-------------------------------------+-------------------------------------
        Reporter:  tmobile           |                Owner:  tmobile
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.8.1
       Component:  Compiler          |              Version:  8.4.3
      Resolution:                    |             Keywords:
Operating System:  Linux             |         Architecture:  aarch64
 Type of failure:  Compile-time      |            Test Case:
  crash or panic                     |
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by tmobile):

 I tried the absolute dumbest possible fix for this problem: inserting a
 fence before and after each atomic operation. Certainly many of these are
 superfluous, but I'm simply attempting an elephant gun solution that
 verifies the working hypothesis.

 It's up here https://github.com/traviswhitaker/ghc/tree/ghc843-wip/T15449

 This seems to have no effect on this failure. When I inspected the machine
 code, I was surprised to find that very few dmb, dsb, and isb instructions
 were emitted. Only the RTS code (particularly for evacuation) and ghc-prim
 (particularly in the hs_atomic_* functions, no surprise there) seem to
 contain any dmb, dsb, or isb instructions. It seems as though none of the
 ldrex/strex style instructions are emitted at all. I'm in over my head
 when it comes to how GHC works here, so perhaps this is to be expected?
 Are things like std_takeMVar simply implemented with the handful of
 hs_atomic_* primitives? I'd be happy to attach a tarball of some or all of
 the completed build.

 Another thought: perhaps some architecture-specific assumptions have snuck
 into the Stg to Cmm pass or some Cmm to Cmm optimization that is
 performed. Perhaps it's just more trouble in StgCmmPrim, like
 https://ghc.haskell.org/trac/ghc/ticket/12469

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15449#comment:16>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list