<div dir="ltr">Thanks so much for the pointers, Ben.<div><br></div><div>I opened a ticket here <a href="https://ghc.haskell.org/trac/ghc/ticket/15449">https://ghc.haskell.org/trac/ghc/ticket/15449</a></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jul 27, 2018 at 6:51 AM, Ben Gamari <span dir="ltr"><<a href="mailto:ben@smart-cactus.org" target="_blank">ben@smart-cactus.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">Travis Whitaker <<a href="mailto:pi.boy.travis@gmail.com">pi.boy.travis@gmail.com</a>> writes:<br>
<br>
> Hello GHC Devs,<br>
><br>
> It seems to me that GHC is rather broken on aarch64, at least since 8.2.1<br>
> (and at least on the machines I have access to). I first noticed this issue<br>
> with Nixpkgs (<a href="https://github.com/NixOS/nixpkgs/issues/40301" rel="noreferrer" target="_blank">https://github.com/NixOS/<wbr>nixpkgs/issues/40301</a>), so to check<br>
> that this isn't some Nixpkgs idiosyncrasy I went ahead and built my own GHC<br>
> 8.4.3 for aarch64 (there's no binary release at<br>
> <a href="https://www.haskell.org/ghc/download_ghc_8_4_3.html" rel="noreferrer" target="_blank">https://www.haskell.org/ghc/<wbr>download_ghc_8_4_3.html</a> to try, but perhaps<br>
> I've missed something.<br>
><br>
> It seems the only Nix idiosyncrasy was passing "--ghc-option=-j${cores}" to<br>
> "./Setup.hs configure". The issue is triggered by using '-jn' for any n<br>
> greater than one when building any non-trivial package, but I've found<br>
> hscolour1.24.4 reproduces it very reliably (perhaps because there are<br>
> opportunities for parallelism early in its module dependency graph?). GHC<br>
> very often (although not always) will fail with one of:<br>
><br>
> - Segmentation fault.<br>
> - Bus fault<br>
> - <no location info>: error:<br>
> ghc: panic! (the 'impossible' happened)<br>
> (GHC version 8.4.3 for aarch64-unknown-linux):<br>
> Binary.UserData: no put_binding_name<br>
><br>
> - ghc: internal error: MUT_VAR_CLEAN object entered!<br>
> (GHC version 8.4.3 for aarch64_unknown_linux)<br>
> Please report this as a GHC bug: <a href="http://www.haskell.org/ghc/reportabug" rel="noreferrer" target="_blank">http://www.haskell.org/ghc/<wbr>reportabug</a><br>
> Aborted (core dumped)<br>
><br>
</div></div>Ugh, that is awful.<br>
<span class=""><br>
> The fix, excruciating as it may be on already slow arm machines, is to use<br>
> '-j1'. This issue seems present on each GHC release since 8.2.1 (although I<br>
> haven't tried HEAD yet). I haven't noticed any issues with any other<br>
> concurrent Haskell programs on aarch64.<br>
><br>
> There are some umbrella bugs for aarch64 in Trac, so I wanted to ask here<br>
> before filing a ticket. Has anyone else noticed this behavior on aarch64?<br>
> What's more, are there any tips for using GDB to hunt down synchronization<br>
> issues in GHC?<br>
><br>
</span>Definitely open a new ticket.<br>
<br>
The methodology for tracking down issues like this is quite<br>
case-specific but I do have some general recommendations: On x86-64 I<br>
use rr [1], which is an invaluable tool. Sadly this isn't an option on<br>
AArch64 AFAIK. I also have some gdb extensions to take much of the<br>
monotony away from inspecting GHC's heap and internal data structures<br>
[2]. I've not used them on AArch64 so there may be a few compatibility<br>
issues but I suspect they wouldn't be hard to fix.<br>
<br>
I know it may be hard in this case but I would at least try to reduce<br>
the size of the failing program to something that fits in less than a<br>
few hundred lines. Low-level debugging is hard enough when you can keep<br>
the program in your head; debugging all of GHC this way is possible but<br>
much harder. Given that this appears to be threading-specific, I would<br>
also pay particular attention to the GHC and base's use of barriers and<br>
atomics. It's possible that we are just missing a barrier somewhere.<br>
<br>
Finally, you might quickly try building 8.0 to see whether bisection is<br>
a possibility. It would be a slow process, given the speed of the<br>
hardware involved, but ultimately it can be much more time efficient<br>
once you have it setup since you can replace human debugging time (a<br>
very finite commodity) with computation.<br>
<br>
Good luck and let us know if you get stuck,<br>
<br>
- Ben<br>
<br>
<br>
[1] <a href="http://rr-project.org/" rel="noreferrer" target="_blank">http://rr-project.org/</a><br>
[2] <a href="https://github.com/bgamari/ghc-utils/tree/master/gdb" rel="noreferrer" target="_blank">https://github.com/bgamari/<wbr>ghc-utils/tree/master/gdb</a><br>
</blockquote></div><br></div>