<div dir="ltr"><div dir="ltr"><div dir="ltr">FWIW, my main reference at the time when this stuff was implemented was this page by Doug Lea: <a href="http://gee.cs.oswego.edu/dl/jmm/cookbook.html">http://gee.cs.oswego.edu/dl/jmm/cookbook.html</a></div><div dir="ltr"><br></div><div>As Ben says, things have evolved a lot since then. I'm not an expert at all, but I know from experience that getting this stuff right is really hard. Even on x86 we had a tough time figuring out where to put this barrier: <a href="https://phabricator.haskell.org/diffusion/GHC/browse/master/rts%2FWSDeque.c$135-137">https://phabricator.haskell.org/diffusion/GHC/browse/master/rts%2FWSDeque.c$135-137</a></div><div><br></div><div>My understanding of the current memory model is this:</div><div>- we give no guarantees about ordering between non-atomic IORef operations, except that: doing things in parallel shouldn't segfault". So if a processor can see a pointer in an IORef, it can safely follow the pointer and find the memory it points to correctly initialized. This may require barriers on some architectures, but not x86(_64) as I understand it.</div><div>- MVar operations and atomicModifyIORef are full barriers. Or something.</div><div><br></div><div>Cheers</div><div>Simon</div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Thu, 29 Nov 2018 at 04:44, Travis Whitaker <<a href="mailto:pi.boy.travis@gmail.com">pi.boy.travis@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Hello GHC Devs,<div><br></div><div>I'm trying to get my head around ticket #15449 (<a href="https://ghc.haskell.org/trac/ghc/ticket/15449" target="_blank">https://ghc.haskell.org/trac/ghc/ticket/15449</a>). This gist of things is that GHC generates incorrect aarch64 code that causes memory corruption in multithreaded programs run on out-of-order machines. User trommler discovered that similar issues are present on PowerPC, and indeed ARMv7 and PowerPC support the same types of load/store reorderings. The LLVM code emitted by GHC may be incorrect with respect to LLVM's memory model, but this isn't a problem on architectures with minimal reordering like x86.</div><div><br></div><div>I had initially thought that GHC simply wasn't emitting the appropriate LLVM fences; there's an elephant-gun-approach here (<a href="https://github.com/TravisWhitaker/ghc/commits/ghc843-wip/T15449" target="_blank">https://github.com/TravisWhitaker/ghc/commits/ghc843-wip/T15449</a>) that guards each atomic operation with a full barrier. I still believe that GHC is omitting necessary LLVM fences, but this change is insufficient to fix the behavior of the test case (which is simply GHC itself compiling a test package with '-jN', N > 1).</div><div><br></div><div>It seems there's a long and foggy history of the Cmm memory model. Edward Yang discusses this a bit in his post here (<a href="http://blog.ezyang.com/2014/01/so-you-want-to-add-a-new-concurrency-primitive-to-ghc/" target="_blank">http://blog.ezyang.com/2014/01/so-you-want-to-add-a-new-concurrency-primitive-to-ghc/</a>) and issues similar to #15449 have plagued GHC in the past, like #12469 (<a href="https://ghc.haskell.org/trac/ghc/ticket/12469" target="_blank">https://ghc.haskell.org/trac/ghc/ticket/12469</a>). Worryingly, GHC only has MO_WriteBarrier, whereas PowerPC and ARMv7 really need read, write, and full memory barriers. On ARM an instruction memory barrier might be required as well, but I don't know enough about STG/Cmm to say for sure, and it'd likely be LLVM's responsibility to emit that anyway.</div><div><br></div><div>I'm hoping that someone with more tribal knowledge than I might be able to give me some pointers with regards to the following areas:</div><div><br></div><div><ul><li>Does STG itself have anything like a memory model? My intuition says 'definitely not', but given that STG expressions may contain Cmm operations (via StgCmmPrim), there may be STG-to-STG transformations that need to care about the target machine's memory model.</li><li>With respect to Cmm, what reorderings does GHC perform? What are the relevant parts of the compiler to begin studying?</li><li>Are the LLVM atomics that GHC emits correct with respect to the LLVM memory model? As it stands now LLVM fences are only emitted for MO_WriteBarrier. Without fences accompanying the atomics, it seems the LLVM compiler could float dependent loads/stores past atomic operations.</li><li>Why is MO_WriteBarrier the only provided memory barrier? My hunch is that it's because this is the only sort of barrier required on x86, which only allows loads to be reordered with older stores, but perhaps I'm missing something? Is it plausible that Cmm simply needs additional barrier primitives to target these weaker memory models? Conversely, is there some property of Cmm that let's us get away without read barriers at all?</li></ul><div><br></div></div><div>Naturally, if I've got any of this wrong or are otherwise barking up the wrong tree, please let me know.</div><div><br></div><div>Thanks for all your efforts!</div><div><br></div><div>Travis Whitaker</div></div></div></div></div></div></div>
_______________________________________________<br>
ghc-devs mailing list<br>
<a href="mailto:ghc-devs@haskell.org" target="_blank">ghc-devs@haskell.org</a><br>
<a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs" rel="noreferrer" target="_blank">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a><br>
</blockquote></div>