Bad news: apparent bug in casMutVar going back to 7.2
PHO
pho at cielonegro.org
Thu Feb 6 04:57:41 UTC 2014
How about using libatomic_ops? [*1]
GCC atomic intrinsics are only provided by GCC >= 4.2. Clang seems to
have an equivalent set of intrinsics but in different names
[*2]. Basically we should avoid using any non-standard language
extensions where possible.
*1: https://github.com/ivmai/libatomic_ops/
*2: http://clang.llvm.org/docs/LanguageExtensions.html#langext-c11-atomic
Thanks,
PHO
From: Simon Marlow <marlowsd at gmail.com>
Subject: Re: Bad news: apparent bug in casMutVar going back to 7.2
Date: Sat, 01 Feb 2014 21:00:13 +0000
> So there's no problem in casMutVar#?
>
> There's probably no good reason not to use the gcc intrinsics. Either
> they didn't exist when I wrote the inline asm, or they had been added
> to gcc since I last read the docs.
>
> Cheers,
> Simon
>
> On 01/02/2014 19:35, Ryan Newton wrote:
>> Ok, my bad, sorry all.
>>
>> This is NOT a problem that will crop up in 7.8. Rather, it's just a
>> problem with the duplicated bits of GHC RTS functionality that were
>> stuck into the atomic-primops library. It was a C preprocessor
>> problem
>> that was causing the inline asm we were discussing in this thread to
>> not
>> actually be called.
>>
>> Still, I'd like to be reminded of the rational for all this
>> conditional
>> inline asm rather than using the C compiler intrinsics! Anyone?
>>
>> Best,
>> -Ryan
>>
>>
>>
>> On Sat, Feb 1, 2014 at 2:17 PM, Carter Schonwald
>> <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>>
>> wrote:
>>
>> I got the test suite running on my (2 core) machine mac book air,
>> with 7.8
>> i've run it several times, not seeing any failures
>>
>>
>> On Sat, Feb 1, 2014 at 1:03 PM, Carter Schonwald
>> <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>>
>> wrote:
>>
>> hrmmmm I have a crazy idea
>>
>> "Compare RAX with r/m64. If equal, ZF is set and r64 is loaded
>> into r/m64. Else, clear ZF
>> and load r/m64 into RAX." is what the docs say for the cmpxchng
>> instruction
>>
>> so RAX is the old values, (EAX in the 32bit case). And it
>> looks like we dont' set that explicitly when calling the asm ..
>> CMPXCHG r/m64, r64
>>
>> hrmmm
>>
>>
>>
>>
>> On Sat, Feb 1, 2014 at 12:52 PM, Ryan Newton <rrnewton at gmail.com
>> <mailto:rrnewton at gmail.com>> wrote:
>>
>> Ok, here's another experiment, on this commit:
>>
>> https://github.com/rrnewton/haskell-lockfree/commit/399bb19fa02eaf2f2eab5d02c4b608535362f9bc
>>
>> Here, if I use GCC's __sync_val_compare_and_swap instead of
>> GHC's version of cas(), the problem also goes away. I think
>> these two implementations should behave identically, and
>> that they don't perhaps indicates that there is something
>> off about the inline asm, as Carter was suggesting:
>>
>> *#if i386_HOST_ARCH || x86_64_HOST_ARCH*
>> * __asm__ __volatile__ (*
>> * "lock\ncmpxchg %3,%1"*
>> * :"=a"(o), "=m" (*(volatile unsigned int *)p) *
>> * :"0" (o), "r" (n));*
>> * return o;*
>>
>> The x86 CAS instruction must put the "return value" in the
>> accumulator register, and indeed this constrains "o" to be
>> allocated to the accumulator register, while the new value
>> "n" can be in any register.
>>
>> So if there's a problem, I don't know what it is. Except
>> I'm not sure what the ramifications of "o" being a function
>> parameter AND having an "=a" constraint on it are...
>>
>> -Ryan
>>
>>
>>
>> On Sat, Feb 1, 2014 at 11:27 AM, Carter Schonwald
>> <carter.schonwald at gmail.com
>> <mailto:carter.schonwald at gmail.com>> wrote:
>>
>> Hey Ryan, i've made the leap to using 7.8 on my machine,
>> so i'll first have to get some pull requests in on
>> atomic-primops before I can test it locally :), expect
>> those patches later today!
>>
>> looks like gcc's inline ASM logic is pretty correct,
>> after testing it a bit locally, pardon my speculative
>> jumping the gun.
>>
>>
>> On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton
>> <rrnewton at gmail.com <mailto:rrnewton at gmail.com>> wrote:
>>
>> Hi Carter & others,
>>
>> Carter, yes, this is CAS on pointers and in my next
>> mail I'll try to come up with some hypotheses as to
>> why we may have (remaining) problems there.
>>
>> But first, I have been assured that on x86 there is
>> no failure mode in which doing a comparison on the
>> value read by CAS should not correctly diagnose
>> success or failure (same as directly reading the
>> Zero Flag) [1].
>>
>> And yet, there's this discrepancy, where the
>> modified casMutVar that I linked to does not have
>> the failure. As for reproducing the failure, either
>> of the two following tests will currently show problems:
>>
>> * Two threads try to casIORef False->True, both
>> succeed
>> * 120 threads try to read, increment, CAS until
>> they succeed. The total is often not 120
>> because multiple threads think the successfully
>> incremented, say, 33->34.
>>
>> Here's a specific recipe for the latter test on GHC
>> 7.6.3 Mac or Linux:
>>
>> *git clone
>> git at github.com:rrnewton/haskell-lockfree-queue.git
>> *
>> *cd haskell-lockfree-queue/AtomicPrimops/*
>> *git checkout 1a1e7e55f6706f9e5754
>> *
>> *cabal sandbox init
>> *
>> *cabal install -f-withTH -fforeign ./ ./testing
>> --enable-tests
>> *
>> *./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops
>> -t n_threads
>> *
>>
>> You may have to run the last line several times to
>> see the failure.
>>
>> Best,
>> -Ryan
>>
>> [1] I guess the __sync_bool_compare_and_swap
>> intrinsic which reads ZF is there just to avoid the
>> extra comparison.
>>
>> [2] P.S. I'd like to try this on GHC head, but the
>> RHEL 6 machine I usually use to build it is
>> currently not validating (below error, commit
>> 65d05d7334). After I debug this gmp problem I'll
>> confirm that the bug under discussion applies on the
>> 7.8 branch.
>>
>> ./sync-all checkout ghc-7.8
>> sh validate
>> ...
>> /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o:
>> relocation R_X86_64_32 against `__gmpz_sub' can not
>> be used when making a shared object; recompile with
>> -fPIC
>> libraries/integer-gmp/gmp/objs/aors.o: could not
>> read symbols: Bad value
>>
>>
>>
>>
>> On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald
>> <carter.schonwald at gmail.com
>> <mailto:carter.schonwald at gmail.com>> wrote:
>>
>> Ryan, is your benchmark using CAS on pointers,
>> or immediate words? trying to get atomic primops
>> to build on my 7.8 build on my mac
>>
>>
>> On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald
>> <carter.schonwald at gmail.com
>> <mailto:carter.schonwald at gmail.com>> wrote:
>>
>> https://ghc.haskell.org/trac/ghc/ticket/8724#ticket
>> is the ticket
>>
>> when i'm more awake i'll experiment some more
>>
>>
>> On Sat, Feb 1, 2014 at 2:33 AM, Carter
>> Schonwald <carter.schonwald at gmail.com
>> <mailto:carter.schonwald at gmail.com>> wrote:
>>
>> i have a ticket for tracking this,
>> though i'm thinking my initial attempt
>> at a patch generates the same object
>> code as it did before.
>>
>> @ryan, what CPU variant are you testing
>> this on? is this on a NUMA machine or
>> something?
>>
>>
>> On Sat, Feb 1, 2014 at 1:58 AM, Carter
>> Schonwald <carter.schonwald at gmail.com
>> <mailto:carter.schonwald at gmail.com>> wrote:
>>
>> woops, i mean cmpxchgq
>>
>>
>> On Sat, Feb 1, 2014 at 1:36 AM,
>> Carter Schonwald
>> <carter.schonwald at gmail.com
>> <mailto:carter.schonwald at gmail.com>>
>> wrote:
>>
>> ok, i can confirm that on my
>> 64bit mac, both clang and gcc
>> use cmpxchgl rather than cmpxchg
>> i'll whip up a strawman patch on
>> head that can be cherrypicked /
>> tested out by ryan et al
>>
>>
>> On Sat, Feb 1, 2014 at 1:12 AM,
>> Carter Schonwald
>> <carter.schonwald at gmail.com
>> <mailto:carter.schonwald at gmail.com>>
>> wrote:
>>
>> Hey Ryan,
>> looking at this closely
>> Why isn't CAS using
>> CMPXCHG8B on 64bit
>> architectures? Could that
>> be the culprit?
>>
>> Could the issue be that
>> we've not had a good stress
>> test that would create
>> values that are equal on the
>> 32bit range, but differ on
>> the 64bit range, and you're
>> hitting that?
>>
>> Could you try seeing if
>> doing that change fixes
>> things up?
>> (I may be completely wrong,
>> but just throwing this out
>> as a naive "obvious" guess)
>>
>>
>> On Sat, Feb 1, 2014 at 12:58
>> AM, Ryan Newton
>> <rrnewton at gmail.com
>> <mailto:rrnewton at gmail.com>>
>> wrote:
>>
>> Then again... I'm having
>> trouble seeing how the
>> spec on page 3-149 of
>> the Intel manual would
>> allow the behavior I'm
>> seeing:
>>
>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>
>> Nevertheless, this is
>> exactly the behavior
>> we're seeing with the
>> current Haskell primops.
>> Two threads
>> simultaneously
>> performing the same
>> CAS(p,a,b) can both
>> think that they succeeded.
>>
>>
>>
>>
>>
>> On Sat, Feb 1, 2014 at
>> 12:33 AM, Ryan Newton
>> <rrnewton at gmail.com
>> <mailto:rrnewton at gmail.com>>
>> wrote:
>>
>> I commented on the
>> commit here:
>>
>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>
>> The problem is that
>> our "cas" routine in
>> SMP.h is similar to
>> the C compiler
>> intrinsic
>> __sync_val_compare_and_swap,
>> in that it returns
>> the old value. But
>> it seems we cannot
>> use a comparison
>> against that old
>> value to determine
>> whether or not the
>> CAS succeeded. (I
>> believe the CAS may
>> fail due to
>> contention, but the
>> old value may happen
>> to look like our old
>> value.)
>>
>> Unfortunately, this
>> didn't occur to me
>> until it started
>> causing bugs [1]
>> [2]. Fixing
>> casMutVar# fixes
>> these bugs.
>> However, the way
>> I'm currently fixing
>> CAS in the
>> "atomic-primops"
>> package is by using
>> __sync_bool_compare_and_swap:
>>
>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>
>> What is the best fix
>> for GHC itself?
>> Would it be ok for
>> GHC to include a C
>> compiler intrinsic
>> like
>> __sync_val_compare_and_swap?
>> Otherwise we need
>> another big ifdbef'd
>> function like "cas"
>> in SMP.h that has
>> the
>> architecture-specific inline
>> asm across all
>> architectures. I
>> can write the x86
>> one, but I'm not
>> eager to try the others.
>>
>> Best,
>> -Ryan
>>
>> [1]
>> https://github.com/iu-parfunc/lvars/issues/70
>> [2]
>> https://github.com/rrnewton/haskell-lockfree/issues/15
>>
>>
>>
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> <mailto:ghc-devs at haskell.org>
>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140206/90ef3815/attachment.sig>
More information about the ghc-devs
mailing list