[GHC] #15449: Nondeterministic Failure on aarch64 with -jn, n > 1

GHC ghc-devs at haskell.org
Wed Sep 5 21:12:37 UTC 2018


#15449: Nondeterministic Failure on aarch64 with -jn, n > 1
-------------------------------------+-------------------------------------
        Reporter:  tmobile           |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.8.1
       Component:  Compiler          |              Version:  8.4.3
      Resolution:                    |             Keywords:
Operating System:  Linux             |         Architecture:  aarch64
 Type of failure:  Compile-time      |            Test Case:
  crash or panic                     |
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by Thra11):

 I have noticed something which might be relevant. I have two aarch64
 machines:
 1. Quad-core laptop: 4 x A53, 2GB RAM
 2. Hex-core SBC: 2 x A72 + 4 x A53, 4GB RAM

 Testing with trommler's test-package, GHC on the Hex-core with the A72
 cores fails often (segmentation fault/illegal hardware instruction/bus
 error), while the Quad core ''without'' the A72 cores consistently
 succeeds.

 > As far as the difference between 32-bit and 64-bit ARM, the only thing I
 can guess is that perhaps the smaller ARM chips have much simpler
 instruction pipelines and don't necessarily perform the allowed
 reorderings in practice?

 Following this line of thinking, I'm wondering if the A53's fall into the
 'simpler instruction pipeline' bucket, while the A72's and Denver2's are
 more complex. The other possibility that springs to mind is that having
 faster cores simply changes timings so as to make certain race conditions
 more likely. However, if this was the case, I think I would expect to see
 at least ''some'' failures on the slower CPU.

 trommler mentions that he was seeing the failures on a NVIDIA Jetson TX2,
 which appears to be 2 x Denver2 + 4 x A57. I'm not familiar with these
 cores, but I assume that at least the Denver2 is fairly complex.

 I have found that the laptop's (Quad core A53) success isn't limited to
 this little test case. Before I got the SBC (2xA72 + 4xA53, which I use as
 a nix build server), I successfully built GHC and a range of haskell
 packages on the laptop (slowly: 2G RAM ends up swapping quite a bit
 building GHC). However, using the SBC, I haven't been able to build GHC
 itself, and package building is inconsistent (some packages sometimes
 succeed, others always fail).

 Apologies if this is all rather speculative and anecdotal, but I'm hoping
 it might give someone more familiar with ghc, llvm and CPUs ideas.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15449#comment:13>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list