memory ordering

Wed Jan 1 19:38:00 UTC 2014

One point I'm getting from this discussion is that perhaps not much time
has been spent considering these issues in ghc backends. If so, it's
probably a good thing to work through it now.

For myself, I guess the only option I have now is to measure using
loadLoadBarrier and see if it's better or worse than calling
atomicModifyIORef.
On Dec 31, 2013 6:42 AM, "Edward Z. Yang" <ezyang at mit.edu> wrote:

> I was thinking about my response, and realized there was one major
> misleading thing in my description.  The load reordering I described
> applies to load instructions in C-- proper, i.e. things that show up
> in the C-- dup as:
>
>     W_ x = I64[...addr...]
>
> Reads to IORefs and reads to vectors get compiled inline (as they
> eventually translate into inline primops), so my admonitions are
> applicable.
>
> However, the story with *foreign primops* (which is how loadLoadBarrier
> in atomic-primops is defined, how you might imagine defining a custom
> read function as a primop) is a little different.  First, what does a
> call to an foreign primop compile into? It is *not* inlined, so it will
> eventually get compiled into a jump (this could be a problem if you're
> really trying to squeeze out performance!)  Second, the optimizer is a
> bit more conservative when it comes to primop calls (internally referred
> to as "unsafe foreign calls"); at the moment, the optimizer assumes
> these foreign calls clobber heap memory, so we *automatically* will not
> push loads/stores beyond this boundary. (NB: We reserve the right to
> change this in the future!)
>
> This is probably why atomic-primops, as it is written today, seems to
> work OK, even in the presence of the optimizer.  But I also have a hard
> time believing it gives the speedups you want, due to the current
> design. (CC'd Ryan Newton, because I would love to be wrong here, and
> maybe he can correct me on this note.)
>
> Cheers,
> Edward
>
> P.S. loadLoadBarrier compiles to a no-op on x86 architectures, but
> because it's not inlined I think you will still end up with a jump (LLVM
> might be able to eliminate it).
>
> Excerpts from John Lato's message of 2013-12-31 03:01:58 +0800:
> > Hi Edward,
> >
> > Thanks very much for this reply, it answers a lot of questions I'd had.
> >  I'd hoped that ordering would be preserved through C--, but c'est la
> vie.
> >  Optimizing compilers are ever the bane of concurrent algorithms!
> >
> > stg/SMP.h does define a loadLoadBarrier, which is exposed in Ryan
> Newton's
> > atomic-primops package.  From the docs, I think that's a general read
> > barrier, and should do what I want.  Assuming it works properly, of
> course.
> >  If I'm lucky it might even be optimized out.
> >
> > Thanks,
> > John
> >
> > On Mon, Dec 30, 2013 at 6:04 AM, Edward Z. Yang <ezyang at mit.edu> wrote:
> >
> > > Hello John,
> > >
> > > Here are some prior discussions (which I will attempt to summarize
> > > below):
> > >
> > >     http://www.haskell.org/pipermail/haskell-cafe/2011-May/091878.html
> > >
> http://www.haskell.org/pipermail/haskell-prime/2006-April/001237.html
> > >
> http://www.haskell.org/pipermail/haskell-prime/2006-March/001079.html
> > >
> > > The guarantees that Haskell and GHC give in this area are hand-wavy at
> > > best; at the moment, I don't think Haskell or GHC have a formal memory
> > > model—this seems to be an open research problem. (Unfortunately, AFAICT
> > > all the researchers working on relaxed memory models have their hands
> > > full with things like C++ :-)
> > >
> > > If you want to go ahead and build something that /just/ works for a
> > > /specific version/ of GHC, you will need to answer this question
> > > separately for every phase of the compiler.  For Core and STG, monads
> > > will preserve ordering, so there is no trouble.  However, for C--, we
> > > will almost certainly apply optimizations which reorder reads (look at
> > > CmmSink.hs).  To properly support your algorithm, you will have to add
> > > some new read barrier mach-ops, and teach the optimizer to respect
> them.
> > > (This could be fiendishly subtle; it might be better to give C-- a
> > > memory model first.)  These mach-ops would then translate into
> > > appropriate arch-specific assembly or LLVM instructions, preserving
> > > the guarantees further.
> > >
> > > This is not related to your original question, but the situation is a
> > > bit better with regards to reordering stores: we have a WriteBarrier
> > > MachOp, which in principle, prevents store reordering.  In practice, we
> > > don't seem to actually have any C-- optimizations that reorder stores.
> > > So, at least you can assume these will work OK!
> > >
> > > Hope this helps (and is not too inaccurate),
> > > Edward
> > >
> > > Excerpts from John Lato's message of 2013-12-20 09:36:11 +0800:
> > > > Hello,
> > > >
> > > > I'm working on a lock-free algorithm that's meant to be used in a
> > > > concurrent setting, and I've run into a possible issue.
> > > >
> > > > The crux of the matter is that a particular function needs to
> perform the
> > > > following:
> > > >
> > > > > x <- MVector.read vec ix
> > > > > position <- readIORef posRef
> > > >
> > > > and the algorithm is only safe if these two reads are not reordered
> (both
> > > > the vector and IORef are written to by other threads).
> > > >
> > > > My concern is, according to standard Haskell semantics this should be
> > > safe,
> > > > as IO sequencing should guarantee that the reads happen in-order.  Of
> > > > course this also relies upon the architecture's memory model, but x86
> > > also
> > > > guarantees that reads happen in order.  However doubts remain; I do
> not
> > > > have confidence that the code generator will handle this properly.
>  In
> > > > particular, LLVM may freely re-order loads of NotAtomic and Unordered
> > > > values.
> > > >
> > > > The one hope I have is that ghc will preserve IO semantics through
> the
> > > > entire pipeline.  This seems like it would be necessary for proper
> > > handling
> > > > of exceptions, for example.  So, can anyone tell me if my worries are
> > > > unfounded, or if there's any way to ensure the behavior I want?  I
> could
> > > > change the readIORef to an atomicModifyIORef, which should issue an
> > > mfence,
> > > > but that seems a bit heavy-handed as just a read fence would be
> > > sufficient
> > > > (although even that seems more than necessary).
> > > >
> > > > Thanks,
> > > > John L.
> > >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20140101/85c1cae6/attachment.html>