Bad news: apparent bug in casMutVar going back to 7.2

Simon Marlow marlowsd at gmail.com
Sat Feb 1 21:00:13 UTC 2014


So there's no problem in casMutVar#?

There's probably no good reason not to use the gcc intrinsics.  Either 
they didn't exist when I wrote the inline asm, or they had been added to 
gcc since I last read the docs.

Cheers,
Simon

On 01/02/2014 19:35, Ryan Newton wrote:
> Ok, my bad, sorry all.
>
> This is NOT a problem that will crop up in 7.8.   Rather, it's just a
> problem with the duplicated bits of GHC RTS functionality that were
> stuck into the atomic-primops library.  It was a C preprocessor problem
> that was causing the inline asm we were discussing in this thread to not
> actually be called.
>
> Still, I'd like to be reminded of the rational for all this conditional
> inline asm rather than using the C compiler intrinsics!  Anyone?
>
> Best,
>    -Ryan
>
>
>
> On Sat, Feb 1, 2014 at 2:17 PM, Carter Schonwald
> <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>> wrote:
>
>     I got the test suite running on my (2 core) machine mac book air,
>     with 7.8
>     i've run it several times, not seeing any failures
>
>
>     On Sat, Feb 1, 2014 at 1:03 PM, Carter Schonwald
>     <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>> wrote:
>
>         hrmmmm I have a crazy idea
>
>         "Compare RAX with r/m64. If equal, ZF is set and r64 is loaded
>         into r/m64. Else, clear ZF
>         and load r/m64 into RAX." is what the docs say for the cmpxchng
>         instruction
>
>         so RAX is the old values,  (EAX in the  32bit case). And it
>         looks like we dont' set that explicitly when calling the asm ..
>         CMPXCHG r/m64, r64
>
>           hrmmm
>
>
>
>
>         On Sat, Feb 1, 2014 at 12:52 PM, Ryan Newton <rrnewton at gmail.com
>         <mailto:rrnewton at gmail.com>> wrote:
>
>             Ok, here's another experiment, on this commit:
>
>             https://github.com/rrnewton/haskell-lockfree/commit/399bb19fa02eaf2f2eab5d02c4b608535362f9bc
>
>             Here, if I use GCC's __sync_val_compare_and_swap instead of
>             GHC's version of cas(), the problem also goes away.  I think
>             these two implementations should behave identically, and
>             that they don't perhaps indicates that there is something
>             off about the inline asm, as Carter was suggesting:
>
>             *#if i386_HOST_ARCH || x86_64_HOST_ARCH*
>             *    __asm__ __volatile__ (*
>             * "lock\ncmpxchg %3,%1"*
>             *          :"=a"(o), "=m" (*(volatile unsigned int *)p) *
>             *          :"0" (o), "r" (n));*
>             *    return o;*
>
>             The x86 CAS instruction must put the "return value" in the
>             accumulator register, and indeed this constrains "o" to be
>             allocated to the accumulator register, while the new value
>             "n" can be in any register.
>
>             So if there's a problem, I don't know what it is.  Except
>             I'm not sure what the ramifications of "o" being a function
>             parameter AND having an "=a" constraint on it are...
>
>                 -Ryan
>
>
>
>             On Sat, Feb 1, 2014 at 11:27 AM, Carter Schonwald
>             <carter.schonwald at gmail.com
>             <mailto:carter.schonwald at gmail.com>> wrote:
>
>                 Hey Ryan, i've made the leap to using 7.8 on my machine,
>                 so i'll first have to get some pull requests in on
>                 atomic-primops before I can test it locally :), expect
>                 those patches later today!
>
>                 looks like gcc's inline ASM logic is pretty correct,
>                 after testing it a bit locally, pardon my speculative
>                 jumping the gun.
>
>
>                 On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton
>                 <rrnewton at gmail.com <mailto:rrnewton at gmail.com>> wrote:
>
>                     Hi Carter & others,
>
>                     Carter, yes, this is CAS on pointers and in my next
>                     mail I'll try to come up with some hypotheses as to
>                     why we may have (remaining) problems there.
>
>                     But first, I have been assured that on x86 there is
>                     no failure mode in which doing a comparison on the
>                     value read by CAS should not correctly diagnose
>                     success or failure (same as directly reading the
>                     Zero Flag) [1].
>
>                     And yet, there's this discrepancy, where the
>                     modified casMutVar that I linked to does not have
>                     the failure.  As for reproducing the failure, either
>                     of the two following tests will currently show problems:
>
>                       * Two threads try to casIORef False->True, both
>                         succeed
>                       * 120 threads try to read, increment, CAS until
>                         they succeed.  The total is often not 120
>                         because multiple threads think the successfully
>                         incremented, say, 33->34.
>
>                     Here's a specific recipe for the latter test on GHC
>                     7.6.3 Mac or Linux:
>
>                     *git clone
>                     git at github.com:rrnewton/haskell-lockfree-queue.git
>                     *
>                     *cd haskell-lockfree-queue/AtomicPrimops/*
>                     *git checkout 1a1e7e55f6706f9e5754
>                     *
>                     *cabal sandbox init
>                     *
>                     *cabal install -f-withTH -fforeign ./ ./testing
>                     --enable-tests
>                     *
>                     *./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops
>                     -t n_threads
>                     *
>
>                     You may have to run the last line several times to
>                     see the failure.
>
>                     Best,
>                        -Ryan
>
>                     [1] I guess the __sync_bool_compare_and_swap
>                     intrinsic which reads ZF is there just to avoid the
>                     extra comparison.
>
>                     [2] P.S. I'd like to try this on GHC head, but the
>                     RHEL 6 machine I usually use to build it is
>                     currently not validating (below error, commit
>                     65d05d7334).  After I debug this gmp problem I'll
>                     confirm that the bug under discussion applies on the
>                     7.8 branch.
>
>                          ./sync-all checkout ghc-7.8
>                          sh validate
>                     ...
>                     /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o:
>                     relocation R_X86_64_32 against `__gmpz_sub' can not
>                     be used when making a shared object; recompile with
>                     -fPIC
>                     libraries/integer-gmp/gmp/objs/aors.o: could not
>                     read symbols: Bad value
>
>
>
>
>                     On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald
>                     <carter.schonwald at gmail.com
>                     <mailto:carter.schonwald at gmail.com>> wrote:
>
>                         Ryan, is your benchmark using CAS on pointers,
>                         or immediate words? trying to get atomic primops
>                         to build on my 7.8 build on my mac
>
>
>                         On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald
>                         <carter.schonwald at gmail.com
>                         <mailto:carter.schonwald at gmail.com>> wrote:
>
>                             https://ghc.haskell.org/trac/ghc/ticket/8724#ticket
>                             is the ticket
>
>                             when i'm more awake i'll experiment some more
>
>
>                             On Sat, Feb 1, 2014 at 2:33 AM, Carter
>                             Schonwald <carter.schonwald at gmail.com
>                             <mailto:carter.schonwald at gmail.com>> wrote:
>
>                                 i have a ticket for tracking this,
>                                 though i'm thinking my initial attempt
>                                 at a patch generates the same object
>                                 code as it did before.
>
>                                 @ryan, what CPU variant are you testing
>                                 this on? is this on a NUMA machine or
>                                 something?
>
>
>                                 On Sat, Feb 1, 2014 at 1:58 AM, Carter
>                                 Schonwald <carter.schonwald at gmail.com
>                                 <mailto:carter.schonwald at gmail.com>> wrote:
>
>                                     woops, i mean cmpxchgq
>
>
>                                     On Sat, Feb 1, 2014 at 1:36 AM,
>                                     Carter Schonwald
>                                     <carter.schonwald at gmail.com
>                                     <mailto:carter.schonwald at gmail.com>>
>                                     wrote:
>
>                                         ok, i can confirm that on my
>                                         64bit mac, both clang and gcc
>                                         use cmpxchgl rather than cmpxchg
>                                         i'll whip up a strawman patch on
>                                         head that can be cherrypicked /
>                                         tested out by ryan et al
>
>
>                                         On Sat, Feb 1, 2014 at 1:12 AM,
>                                         Carter Schonwald
>                                         <carter.schonwald at gmail.com
>                                         <mailto:carter.schonwald at gmail.com>>
>                                         wrote:
>
>                                             Hey Ryan,
>                                             looking at this closely
>                                             Why isn't CAS using
>                                             CMPXCHG8B on 64bit
>                                             architectures?  Could that
>                                             be the culprit?
>
>                                             Could the issue be that
>                                             we've not had a good stress
>                                             test that would create
>                                             values that are equal on the
>                                             32bit range, but differ on
>                                             the 64bit range, and you're
>                                             hitting that?
>
>                                             Could you try seeing if
>                                             doing that change fixes
>                                             things up?
>                                             (I may be completely wrong,
>                                             but just throwing this out
>                                             as a naive "obvious" guess)
>
>
>                                             On Sat, Feb 1, 2014 at 12:58
>                                             AM, Ryan Newton
>                                             <rrnewton at gmail.com
>                                             <mailto:rrnewton at gmail.com>>
>                                             wrote:
>
>                                                 Then again... I'm having
>                                                 trouble seeing how the
>                                                 spec on page 3-149 of
>                                                 the Intel manual would
>                                                 allow the behavior I'm
>                                                 seeing:
>
>                                                 http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>
>                                                 Nevertheless, this is
>                                                 exactly the behavior
>                                                 we're seeing with the
>                                                 current Haskell primops.
>                                                   Two threads
>                                                 simultaneously
>                                                 performing the same
>                                                 CAS(p,a,b) can both
>                                                 think that they succeeded.
>
>
>
>
>
>                                                 On Sat, Feb 1, 2014 at
>                                                 12:33 AM, Ryan Newton
>                                                 <rrnewton at gmail.com
>                                                 <mailto:rrnewton at gmail.com>>
>                                                 wrote:
>
>                                                     I commented on the
>                                                     commit here:
>
>                                                     https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>
>                                                     The problem is that
>                                                     our "cas" routine in
>                                                     SMP.h is similar to
>                                                     the C compiler
>                                                     intrinsic
>                                                     __sync_val_compare_and_swap,
>                                                     in that it returns
>                                                     the old value.  But
>                                                     it seems we cannot
>                                                     use a comparison
>                                                     against that old
>                                                     value to determine
>                                                     whether or not the
>                                                     CAS succeeded.  (I
>                                                     believe the CAS may
>                                                     fail due to
>                                                     contention, but the
>                                                     old value may happen
>                                                     to look like our old
>                                                     value.)
>
>                                                     Unfortunately, this
>                                                     didn't occur to me
>                                                     until it started
>                                                     causing bugs [1]
>                                                     [2].  Fixing
>                                                     casMutVar# fixes
>                                                     these bugs.
>                                                       However, the way
>                                                     I'm currently fixing
>                                                     CAS in the
>                                                     "atomic-primops"
>                                                     package is by using
>                                                     __sync_bool_compare_and_swap:
>
>                                                     https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>
>                                                     What is the best fix
>                                                     for GHC itself?
>                                                     Would it be ok for
>                                                     GHC to include a C
>                                                     compiler intrinsic
>                                                     like
>                                                     __sync_val_compare_and_swap?
>                                                       Otherwise we need
>                                                     another big ifdbef'd
>                                                     function like "cas"
>                                                     in SMP.h that has
>                                                     the
>                                                     architecture-specific inline
>                                                     asm across all
>                                                     architectures.  I
>                                                     can write the x86
>                                                     one, but I'm not
>                                                     eager to try the others.
>
>                                                     Best,
>                                                         -Ryan
>
>                                                     [1]
>                                                     https://github.com/iu-parfunc/lvars/issues/70
>                                                     [2]
>                                                     https://github.com/rrnewton/haskell-lockfree/issues/15
>
>
>
>                                                 _______________________________________________
>                                                 ghc-devs mailing list
>                                                 ghc-devs at haskell.org
>                                                 <mailto:ghc-devs at haskell.org>
>                                                 http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
>
>
>
>
>
>
>
>
>
>
>


More information about the ghc-devs mailing list