Bad news: apparent bug in casMutVar going back to 7.2
Simon Marlow
marlowsd at gmail.com
Sat Feb 1 21:00:13 UTC 2014
So there's no problem in casMutVar#?
There's probably no good reason not to use the gcc intrinsics. Either
they didn't exist when I wrote the inline asm, or they had been added to
gcc since I last read the docs.
Cheers,
Simon
On 01/02/2014 19:35, Ryan Newton wrote:
> Ok, my bad, sorry all.
>
> This is NOT a problem that will crop up in 7.8. Rather, it's just a
> problem with the duplicated bits of GHC RTS functionality that were
> stuck into the atomic-primops library. It was a C preprocessor problem
> that was causing the inline asm we were discussing in this thread to not
> actually be called.
>
> Still, I'd like to be reminded of the rational for all this conditional
> inline asm rather than using the C compiler intrinsics! Anyone?
>
> Best,
> -Ryan
>
>
>
> On Sat, Feb 1, 2014 at 2:17 PM, Carter Schonwald
> <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>> wrote:
>
> I got the test suite running on my (2 core) machine mac book air,
> with 7.8
> i've run it several times, not seeing any failures
>
>
> On Sat, Feb 1, 2014 at 1:03 PM, Carter Schonwald
> <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>> wrote:
>
> hrmmmm I have a crazy idea
>
> "Compare RAX with r/m64. If equal, ZF is set and r64 is loaded
> into r/m64. Else, clear ZF
> and load r/m64 into RAX." is what the docs say for the cmpxchng
> instruction
>
> so RAX is the old values, (EAX in the 32bit case). And it
> looks like we dont' set that explicitly when calling the asm ..
> CMPXCHG r/m64, r64
>
> hrmmm
>
>
>
>
> On Sat, Feb 1, 2014 at 12:52 PM, Ryan Newton <rrnewton at gmail.com
> <mailto:rrnewton at gmail.com>> wrote:
>
> Ok, here's another experiment, on this commit:
>
> https://github.com/rrnewton/haskell-lockfree/commit/399bb19fa02eaf2f2eab5d02c4b608535362f9bc
>
> Here, if I use GCC's __sync_val_compare_and_swap instead of
> GHC's version of cas(), the problem also goes away. I think
> these two implementations should behave identically, and
> that they don't perhaps indicates that there is something
> off about the inline asm, as Carter was suggesting:
>
> *#if i386_HOST_ARCH || x86_64_HOST_ARCH*
> * __asm__ __volatile__ (*
> * "lock\ncmpxchg %3,%1"*
> * :"=a"(o), "=m" (*(volatile unsigned int *)p) *
> * :"0" (o), "r" (n));*
> * return o;*
>
> The x86 CAS instruction must put the "return value" in the
> accumulator register, and indeed this constrains "o" to be
> allocated to the accumulator register, while the new value
> "n" can be in any register.
>
> So if there's a problem, I don't know what it is. Except
> I'm not sure what the ramifications of "o" being a function
> parameter AND having an "=a" constraint on it are...
>
> -Ryan
>
>
>
> On Sat, Feb 1, 2014 at 11:27 AM, Carter Schonwald
> <carter.schonwald at gmail.com
> <mailto:carter.schonwald at gmail.com>> wrote:
>
> Hey Ryan, i've made the leap to using 7.8 on my machine,
> so i'll first have to get some pull requests in on
> atomic-primops before I can test it locally :), expect
> those patches later today!
>
> looks like gcc's inline ASM logic is pretty correct,
> after testing it a bit locally, pardon my speculative
> jumping the gun.
>
>
> On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton
> <rrnewton at gmail.com <mailto:rrnewton at gmail.com>> wrote:
>
> Hi Carter & others,
>
> Carter, yes, this is CAS on pointers and in my next
> mail I'll try to come up with some hypotheses as to
> why we may have (remaining) problems there.
>
> But first, I have been assured that on x86 there is
> no failure mode in which doing a comparison on the
> value read by CAS should not correctly diagnose
> success or failure (same as directly reading the
> Zero Flag) [1].
>
> And yet, there's this discrepancy, where the
> modified casMutVar that I linked to does not have
> the failure. As for reproducing the failure, either
> of the two following tests will currently show problems:
>
> * Two threads try to casIORef False->True, both
> succeed
> * 120 threads try to read, increment, CAS until
> they succeed. The total is often not 120
> because multiple threads think the successfully
> incremented, say, 33->34.
>
> Here's a specific recipe for the latter test on GHC
> 7.6.3 Mac or Linux:
>
> *git clone
> git at github.com:rrnewton/haskell-lockfree-queue.git
> *
> *cd haskell-lockfree-queue/AtomicPrimops/*
> *git checkout 1a1e7e55f6706f9e5754
> *
> *cabal sandbox init
> *
> *cabal install -f-withTH -fforeign ./ ./testing
> --enable-tests
> *
> *./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops
> -t n_threads
> *
>
> You may have to run the last line several times to
> see the failure.
>
> Best,
> -Ryan
>
> [1] I guess the __sync_bool_compare_and_swap
> intrinsic which reads ZF is there just to avoid the
> extra comparison.
>
> [2] P.S. I'd like to try this on GHC head, but the
> RHEL 6 machine I usually use to build it is
> currently not validating (below error, commit
> 65d05d7334). After I debug this gmp problem I'll
> confirm that the bug under discussion applies on the
> 7.8 branch.
>
> ./sync-all checkout ghc-7.8
> sh validate
> ...
> /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o:
> relocation R_X86_64_32 against `__gmpz_sub' can not
> be used when making a shared object; recompile with
> -fPIC
> libraries/integer-gmp/gmp/objs/aors.o: could not
> read symbols: Bad value
>
>
>
>
> On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald
> <carter.schonwald at gmail.com
> <mailto:carter.schonwald at gmail.com>> wrote:
>
> Ryan, is your benchmark using CAS on pointers,
> or immediate words? trying to get atomic primops
> to build on my 7.8 build on my mac
>
>
> On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald
> <carter.schonwald at gmail.com
> <mailto:carter.schonwald at gmail.com>> wrote:
>
> https://ghc.haskell.org/trac/ghc/ticket/8724#ticket
> is the ticket
>
> when i'm more awake i'll experiment some more
>
>
> On Sat, Feb 1, 2014 at 2:33 AM, Carter
> Schonwald <carter.schonwald at gmail.com
> <mailto:carter.schonwald at gmail.com>> wrote:
>
> i have a ticket for tracking this,
> though i'm thinking my initial attempt
> at a patch generates the same object
> code as it did before.
>
> @ryan, what CPU variant are you testing
> this on? is this on a NUMA machine or
> something?
>
>
> On Sat, Feb 1, 2014 at 1:58 AM, Carter
> Schonwald <carter.schonwald at gmail.com
> <mailto:carter.schonwald at gmail.com>> wrote:
>
> woops, i mean cmpxchgq
>
>
> On Sat, Feb 1, 2014 at 1:36 AM,
> Carter Schonwald
> <carter.schonwald at gmail.com
> <mailto:carter.schonwald at gmail.com>>
> wrote:
>
> ok, i can confirm that on my
> 64bit mac, both clang and gcc
> use cmpxchgl rather than cmpxchg
> i'll whip up a strawman patch on
> head that can be cherrypicked /
> tested out by ryan et al
>
>
> On Sat, Feb 1, 2014 at 1:12 AM,
> Carter Schonwald
> <carter.schonwald at gmail.com
> <mailto:carter.schonwald at gmail.com>>
> wrote:
>
> Hey Ryan,
> looking at this closely
> Why isn't CAS using
> CMPXCHG8B on 64bit
> architectures? Could that
> be the culprit?
>
> Could the issue be that
> we've not had a good stress
> test that would create
> values that are equal on the
> 32bit range, but differ on
> the 64bit range, and you're
> hitting that?
>
> Could you try seeing if
> doing that change fixes
> things up?
> (I may be completely wrong,
> but just throwing this out
> as a naive "obvious" guess)
>
>
> On Sat, Feb 1, 2014 at 12:58
> AM, Ryan Newton
> <rrnewton at gmail.com
> <mailto:rrnewton at gmail.com>>
> wrote:
>
> Then again... I'm having
> trouble seeing how the
> spec on page 3-149 of
> the Intel manual would
> allow the behavior I'm
> seeing:
>
> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>
> Nevertheless, this is
> exactly the behavior
> we're seeing with the
> current Haskell primops.
> Two threads
> simultaneously
> performing the same
> CAS(p,a,b) can both
> think that they succeeded.
>
>
>
>
>
> On Sat, Feb 1, 2014 at
> 12:33 AM, Ryan Newton
> <rrnewton at gmail.com
> <mailto:rrnewton at gmail.com>>
> wrote:
>
> I commented on the
> commit here:
>
> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>
> The problem is that
> our "cas" routine in
> SMP.h is similar to
> the C compiler
> intrinsic
> __sync_val_compare_and_swap,
> in that it returns
> the old value. But
> it seems we cannot
> use a comparison
> against that old
> value to determine
> whether or not the
> CAS succeeded. (I
> believe the CAS may
> fail due to
> contention, but the
> old value may happen
> to look like our old
> value.)
>
> Unfortunately, this
> didn't occur to me
> until it started
> causing bugs [1]
> [2]. Fixing
> casMutVar# fixes
> these bugs.
> However, the way
> I'm currently fixing
> CAS in the
> "atomic-primops"
> package is by using
> __sync_bool_compare_and_swap:
>
> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>
> What is the best fix
> for GHC itself?
> Would it be ok for
> GHC to include a C
> compiler intrinsic
> like
> __sync_val_compare_and_swap?
> Otherwise we need
> another big ifdbef'd
> function like "cas"
> in SMP.h that has
> the
> architecture-specific inline
> asm across all
> architectures. I
> can write the x86
> one, but I'm not
> eager to try the others.
>
> Best,
> -Ryan
>
> [1]
> https://github.com/iu-parfunc/lvars/issues/70
> [2]
> https://github.com/rrnewton/haskell-lockfree/issues/15
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> <mailto:ghc-devs at haskell.org>
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
>
>
>
>
>
>
>
>
>
>
>
More information about the ghc-devs
mailing list