Bad news: apparent bug in casMutVar going back to 7.2

PHO pho at cielonegro.org
Thu Feb 6 04:57:41 UTC 2014


How about using libatomic_ops? [*1]

GCC atomic intrinsics are only provided by GCC >= 4.2. Clang seems to
have an equivalent set of intrinsics but in different names
[*2]. Basically we should avoid using any non-standard language
extensions where possible.


*1: https://github.com/ivmai/libatomic_ops/
*2: http://clang.llvm.org/docs/LanguageExtensions.html#langext-c11-atomic

Thanks,
PHO


From: Simon Marlow <marlowsd at gmail.com>
Subject: Re: Bad news: apparent bug in casMutVar going back to 7.2
Date: Sat, 01 Feb 2014 21:00:13 +0000

> So there's no problem in casMutVar#?
>
> There's probably no good reason not to use the gcc intrinsics.  Either
> they didn't exist when I wrote the inline asm, or they had been added
> to gcc since I last read the docs.
>
> Cheers,
> Simon
>
> On 01/02/2014 19:35, Ryan Newton wrote:
>> Ok, my bad, sorry all.
>>
>> This is NOT a problem that will crop up in 7.8.   Rather, it's just a
>> problem with the duplicated bits of GHC RTS functionality that were
>> stuck into the atomic-primops library.  It was a C preprocessor
>> problem
>> that was causing the inline asm we were discussing in this thread to
>> not
>> actually be called.
>>
>> Still, I'd like to be reminded of the rational for all this
>> conditional
>> inline asm rather than using the C compiler intrinsics!  Anyone?
>>
>> Best,
>>    -Ryan
>>
>>
>>
>> On Sat, Feb 1, 2014 at 2:17 PM, Carter Schonwald
>> <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>>
>> wrote:
>>
>>     I got the test suite running on my (2 core) machine mac book air,
>>     with 7.8
>>     i've run it several times, not seeing any failures
>>
>>
>>     On Sat, Feb 1, 2014 at 1:03 PM, Carter Schonwald
>>     <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>>
>>     wrote:
>>
>>         hrmmmm I have a crazy idea
>>
>>         "Compare RAX with r/m64. If equal, ZF is set and r64 is loaded
>>         into r/m64. Else, clear ZF
>>         and load r/m64 into RAX." is what the docs say for the cmpxchng
>>         instruction
>>
>>         so RAX is the old values,  (EAX in the  32bit case). And it
>>         looks like we dont' set that explicitly when calling the asm ..
>>         CMPXCHG r/m64, r64
>>
>>           hrmmm
>>
>>
>>
>>
>>         On Sat, Feb 1, 2014 at 12:52 PM, Ryan Newton <rrnewton at gmail.com
>>         <mailto:rrnewton at gmail.com>> wrote:
>>
>>             Ok, here's another experiment, on this commit:
>>
>>             https://github.com/rrnewton/haskell-lockfree/commit/399bb19fa02eaf2f2eab5d02c4b608535362f9bc
>>
>>             Here, if I use GCC's __sync_val_compare_and_swap instead of
>>             GHC's version of cas(), the problem also goes away.  I think
>>             these two implementations should behave identically, and
>>             that they don't perhaps indicates that there is something
>>             off about the inline asm, as Carter was suggesting:
>>
>>             *#if i386_HOST_ARCH || x86_64_HOST_ARCH*
>>             *    __asm__ __volatile__ (*
>>             * "lock\ncmpxchg %3,%1"*
>>             *          :"=a"(o), "=m" (*(volatile unsigned int *)p) *
>>             *          :"0" (o), "r" (n));*
>>             *    return o;*
>>
>>             The x86 CAS instruction must put the "return value" in the
>>             accumulator register, and indeed this constrains "o" to be
>>             allocated to the accumulator register, while the new value
>>             "n" can be in any register.
>>
>>             So if there's a problem, I don't know what it is.  Except
>>             I'm not sure what the ramifications of "o" being a function
>>             parameter AND having an "=a" constraint on it are...
>>
>>                 -Ryan
>>
>>
>>
>>             On Sat, Feb 1, 2014 at 11:27 AM, Carter Schonwald
>>             <carter.schonwald at gmail.com
>>             <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>                 Hey Ryan, i've made the leap to using 7.8 on my machine,
>>                 so i'll first have to get some pull requests in on
>>                 atomic-primops before I can test it locally :), expect
>>                 those patches later today!
>>
>>                 looks like gcc's inline ASM logic is pretty correct,
>>                 after testing it a bit locally, pardon my speculative
>>                 jumping the gun.
>>
>>
>>                 On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton
>>                 <rrnewton at gmail.com <mailto:rrnewton at gmail.com>> wrote:
>>
>>                     Hi Carter & others,
>>
>>                     Carter, yes, this is CAS on pointers and in my next
>>                     mail I'll try to come up with some hypotheses as to
>>                     why we may have (remaining) problems there.
>>
>>                     But first, I have been assured that on x86 there is
>>                     no failure mode in which doing a comparison on the
>>                     value read by CAS should not correctly diagnose
>>                     success or failure (same as directly reading the
>>                     Zero Flag) [1].
>>
>>                     And yet, there's this discrepancy, where the
>>                     modified casMutVar that I linked to does not have
>>                     the failure.  As for reproducing the failure, either
>>                     of the two following tests will currently show problems:
>>
>>                       * Two threads try to casIORef False->True, both
>>                         succeed
>>                       * 120 threads try to read, increment, CAS until
>>                         they succeed.  The total is often not 120
>>                         because multiple threads think the successfully
>>                         incremented, say, 33->34.
>>
>>                     Here's a specific recipe for the latter test on GHC
>>                     7.6.3 Mac or Linux:
>>
>>                     *git clone
>>                     git at github.com:rrnewton/haskell-lockfree-queue.git
>>                     *
>>                     *cd haskell-lockfree-queue/AtomicPrimops/*
>>                     *git checkout 1a1e7e55f6706f9e5754
>>                     *
>>                     *cabal sandbox init
>>                     *
>>                     *cabal install -f-withTH -fforeign ./ ./testing
>>                     --enable-tests
>>                     *
>>                     *./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops
>>                     -t n_threads
>>                     *
>>
>>                     You may have to run the last line several times to
>>                     see the failure.
>>
>>                     Best,
>>                        -Ryan
>>
>>                     [1] I guess the __sync_bool_compare_and_swap
>>                     intrinsic which reads ZF is there just to avoid the
>>                     extra comparison.
>>
>>                     [2] P.S. I'd like to try this on GHC head, but the
>>                     RHEL 6 machine I usually use to build it is
>>                     currently not validating (below error, commit
>>                     65d05d7334).  After I debug this gmp problem I'll
>>                     confirm that the bug under discussion applies on the
>>                     7.8 branch.
>>
>>                          ./sync-all checkout ghc-7.8
>>                          sh validate
>>                     ...
>>                     /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o:
>>                     relocation R_X86_64_32 against `__gmpz_sub' can not
>>                     be used when making a shared object; recompile with
>>                     -fPIC
>>                     libraries/integer-gmp/gmp/objs/aors.o: could not
>>                     read symbols: Bad value
>>
>>
>>
>>
>>                     On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald
>>                     <carter.schonwald at gmail.com
>>                     <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>                         Ryan, is your benchmark using CAS on pointers,
>>                         or immediate words? trying to get atomic primops
>>                         to build on my 7.8 build on my mac
>>
>>
>>                         On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald
>>                         <carter.schonwald at gmail.com
>>                         <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>                             https://ghc.haskell.org/trac/ghc/ticket/8724#ticket
>>                             is the ticket
>>
>>                             when i'm more awake i'll experiment some more
>>
>>
>>                             On Sat, Feb 1, 2014 at 2:33 AM, Carter
>>                             Schonwald <carter.schonwald at gmail.com
>>                             <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>                                 i have a ticket for tracking this,
>>                                 though i'm thinking my initial attempt
>>                                 at a patch generates the same object
>>                                 code as it did before.
>>
>>                                 @ryan, what CPU variant are you testing
>>                                 this on? is this on a NUMA machine or
>>                                 something?
>>
>>
>>                                 On Sat, Feb 1, 2014 at 1:58 AM, Carter
>>                                 Schonwald <carter.schonwald at gmail.com
>>                                 <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>                                     woops, i mean cmpxchgq
>>
>>
>>                                     On Sat, Feb 1, 2014 at 1:36 AM,
>>                                     Carter Schonwald
>>                                     <carter.schonwald at gmail.com
>>                                     <mailto:carter.schonwald at gmail.com>>
>>                                     wrote:
>>
>>                                         ok, i can confirm that on my
>>                                         64bit mac, both clang and gcc
>>                                         use cmpxchgl rather than cmpxchg
>>                                         i'll whip up a strawman patch on
>>                                         head that can be cherrypicked /
>>                                         tested out by ryan et al
>>
>>
>>                                         On Sat, Feb 1, 2014 at 1:12 AM,
>>                                         Carter Schonwald
>>                                         <carter.schonwald at gmail.com
>>                                         <mailto:carter.schonwald at gmail.com>>
>>                                         wrote:
>>
>>                                             Hey Ryan,
>>                                             looking at this closely
>>                                             Why isn't CAS using
>>                                             CMPXCHG8B on 64bit
>>                                             architectures?  Could that
>>                                             be the culprit?
>>
>>                                             Could the issue be that
>>                                             we've not had a good stress
>>                                             test that would create
>>                                             values that are equal on the
>>                                             32bit range, but differ on
>>                                             the 64bit range, and you're
>>                                             hitting that?
>>
>>                                             Could you try seeing if
>>                                             doing that change fixes
>>                                             things up?
>>                                             (I may be completely wrong,
>>                                             but just throwing this out
>>                                             as a naive "obvious" guess)
>>
>>
>>                                             On Sat, Feb 1, 2014 at 12:58
>>                                             AM, Ryan Newton
>>                                             <rrnewton at gmail.com
>>                                             <mailto:rrnewton at gmail.com>>
>>                                             wrote:
>>
>>                                                 Then again... I'm having
>>                                                 trouble seeing how the
>>                                                 spec on page 3-149 of
>>                                                 the Intel manual would
>>                                                 allow the behavior I'm
>>                                                 seeing:
>>
>>                                                 http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>
>>                                                 Nevertheless, this is
>>                                                 exactly the behavior
>>                                                 we're seeing with the
>>                                                 current Haskell primops.
>>                                                   Two threads
>>                                                 simultaneously
>>                                                 performing the same
>>                                                 CAS(p,a,b) can both
>>                                                 think that they succeeded.
>>
>>
>>
>>
>>
>>                                                 On Sat, Feb 1, 2014 at
>>                                                 12:33 AM, Ryan Newton
>>                                                 <rrnewton at gmail.com
>>                                                 <mailto:rrnewton at gmail.com>>
>>                                                 wrote:
>>
>>                                                     I commented on the
>>                                                     commit here:
>>
>>                                                     https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>
>>                                                     The problem is that
>>                                                     our "cas" routine in
>>                                                     SMP.h is similar to
>>                                                     the C compiler
>>                                                     intrinsic
>>                                                     __sync_val_compare_and_swap,
>>                                                     in that it returns
>>                                                     the old value.  But
>>                                                     it seems we cannot
>>                                                     use a comparison
>>                                                     against that old
>>                                                     value to determine
>>                                                     whether or not the
>>                                                     CAS succeeded.  (I
>>                                                     believe the CAS may
>>                                                     fail due to
>>                                                     contention, but the
>>                                                     old value may happen
>>                                                     to look like our old
>>                                                     value.)
>>
>>                                                     Unfortunately, this
>>                                                     didn't occur to me
>>                                                     until it started
>>                                                     causing bugs [1]
>>                                                     [2].  Fixing
>>                                                     casMutVar# fixes
>>                                                     these bugs.
>>                                                       However, the way
>>                                                     I'm currently fixing
>>                                                     CAS in the
>>                                                     "atomic-primops"
>>                                                     package is by using
>>                                                     __sync_bool_compare_and_swap:
>>
>>                                                     https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>
>>                                                     What is the best fix
>>                                                     for GHC itself?
>>                                                     Would it be ok for
>>                                                     GHC to include a C
>>                                                     compiler intrinsic
>>                                                     like
>>                                                     __sync_val_compare_and_swap?
>>                                                       Otherwise we need
>>                                                     another big ifdbef'd
>>                                                     function like "cas"
>>                                                     in SMP.h that has
>>                                                     the
>>                                                     architecture-specific inline
>>                                                     asm across all
>>                                                     architectures.  I
>>                                                     can write the x86
>>                                                     one, but I'm not
>>                                                     eager to try the others.
>>
>>                                                     Best,
>>                                                         -Ryan
>>
>>                                                     [1]
>>                                                     https://github.com/iu-parfunc/lvars/issues/70
>>                                                     [2]
>>                                                     https://github.com/rrnewton/haskell-lockfree/issues/15
>>
>>
>>
>>                                                 _______________________________________________
>>                                                 ghc-devs mailing list
>>                                                 ghc-devs at haskell.org
>>                                                 <mailto:ghc-devs at haskell.org>
>>                                                 http://www.haskell.org/mailman/listinfo/ghc-devs
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140206/90ef3815/attachment.sig>


More information about the ghc-devs mailing list