Bad news: apparent bug in casMutVar going back to 7.2
Carter Schonwald
carter.schonwald at gmail.com
Sat Feb 1 19:17:13 UTC 2014
I got the test suite running on my (2 core) machine mac book air, with 7.8
i've run it several times, not seeing any failures
On Sat, Feb 1, 2014 at 1:03 PM, Carter Schonwald <carter.schonwald at gmail.com
> wrote:
> hrmmmm I have a crazy idea
>
> "Compare RAX with r/m64. If equal, ZF is set and r64 is loaded into r/m64.
> Else, clear ZF
> and load r/m64 into RAX." is what the docs say for the cmpxchng
> instruction
>
> so RAX is the old values, (EAX in the 32bit case). And it looks like we
> dont' set that explicitly when calling the asm .. CMPXCHG r/m64, r64
>
> hrmmm
>
>
>
> On Sat, Feb 1, 2014 at 12:52 PM, Ryan Newton <rrnewton at gmail.com> wrote:
>
>> Ok, here's another experiment, on this commit:
>>
>>
>> https://github.com/rrnewton/haskell-lockfree/commit/399bb19fa02eaf2f2eab5d02c4b608535362f9bc
>>
>> Here, if I use GCC's __sync_val_compare_and_swap instead of GHC's version
>> of cas(), the problem also goes away. I think these two implementations
>> should behave identically, and that they don't perhaps indicates that there
>> is something off about the inline asm, as Carter was suggesting:
>>
>> *#if i386_HOST_ARCH || x86_64_HOST_ARCH*
>> * __asm__ __volatile__ (*
>> * "lock\ncmpxchg %3,%1"*
>> * :"=a"(o), "=m" (*(volatile unsigned int *)p) *
>> * :"0" (o), "r" (n));*
>> * return o;*
>>
>> The x86 CAS instruction must put the "return value" in the accumulator
>> register, and indeed this constrains "o" to be allocated to the accumulator
>> register, while the new value "n" can be in any register.
>>
>> So if there's a problem, I don't know what it is. Except I'm not sure
>> what the ramifications of "o" being a function parameter AND having an "=a"
>> constraint on it are...
>>
>> -Ryan
>>
>>
>>
>> On Sat, Feb 1, 2014 at 11:27 AM, Carter Schonwald <
>> carter.schonwald at gmail.com> wrote:
>>
>>> Hey Ryan, i've made the leap to using 7.8 on my machine, so i'll first
>>> have to get some pull requests in on atomic-primops before I can test it
>>> locally :), expect those patches later today!
>>>
>>> looks like gcc's inline ASM logic is pretty correct, after testing it a
>>> bit locally, pardon my speculative jumping the gun.
>>>
>>>
>>> On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>>>
>>>> Hi Carter & others,
>>>>
>>>> Carter, yes, this is CAS on pointers and in my next mail I'll try to
>>>> come up with some hypotheses as to why we may have (remaining) problems
>>>> there.
>>>>
>>>> But first, I have been assured that on x86 there is no failure mode in
>>>> which doing a comparison on the value read by CAS should not correctly
>>>> diagnose success or failure (same as directly reading the Zero Flag) [1].
>>>>
>>>> And yet, there's this discrepancy, where the modified casMutVar that I
>>>> linked to does not have the failure. As for reproducing the failure,
>>>> either of the two following tests will currently show problems:
>>>>
>>>> - Two threads try to casIORef False->True, both succeed
>>>> - 120 threads try to read, increment, CAS until they succeed. The
>>>> total is often not 120 because multiple threads think the successfully
>>>> incremented, say, 33->34.
>>>>
>>>> Here's a specific recipe for the latter test on GHC 7.6.3 Mac or Linux:
>>>>
>>>>
>>>> *git clone git at github.com:rrnewton/haskell-lockfree-queue.git *
>>>> *cd haskell-lockfree-queue/AtomicPrimops/*
>>>>
>>>> *git checkout 1a1e7e55f6706f9e5754*
>>>>
>>>> *cabal sandbox init*
>>>>
>>>> *cabal install -f-withTH -fforeign ./ ./testing --enable-tests*
>>>>
>>>> *./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops
>>>> -t n_threads*
>>>>
>>>> You may have to run the last line several times to see the failure.
>>>>
>>>> Best,
>>>> -Ryan
>>>>
>>>> [1] I guess the __sync_bool_compare_and_swap intrinsic which reads ZF
>>>> is there just to avoid the extra comparison.
>>>>
>>>> [2] P.S. I'd like to try this on GHC head, but the RHEL 6 machine I
>>>> usually use to build it is currently not validating (below error, commit
>>>> 65d05d7334). After I debug this gmp problem I'll confirm that the bug
>>>> under discussion applies on the 7.8 branch.
>>>>
>>>> ./sync-all checkout ghc-7.8
>>>> sh validate
>>>> ...
>>>> /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o: relocation
>>>> R_X86_64_32 against `__gmpz_sub' can not be used when making a shared
>>>> object; recompile with -fPIC
>>>> libraries/integer-gmp/gmp/objs/aors.o: could not read symbols: Bad value
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald <
>>>> carter.schonwald at gmail.com> wrote:
>>>>
>>>>> Ryan, is your benchmark using CAS on pointers, or immediate words?
>>>>> trying to get atomic primops to build on my 7.8 build on my mac
>>>>>
>>>>>
>>>>> On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald <
>>>>> carter.schonwald at gmail.com> wrote:
>>>>>
>>>>>> https://ghc.haskell.org/trac/ghc/ticket/8724#ticket is the ticket
>>>>>>
>>>>>> when i'm more awake i'll experiment some more
>>>>>>
>>>>>>
>>>>>> On Sat, Feb 1, 2014 at 2:33 AM, Carter Schonwald <
>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>
>>>>>>> i have a ticket for tracking this, though i'm thinking my initial
>>>>>>> attempt at a patch generates the same object code as it did before.
>>>>>>>
>>>>>>> @ryan, what CPU variant are you testing this on? is this on a NUMA
>>>>>>> machine or something?
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Feb 1, 2014 at 1:58 AM, Carter Schonwald <
>>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>>
>>>>>>>> woops, i mean cmpxchgq
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald <
>>>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> ok, i can confirm that on my 64bit mac, both clang and gcc
>>>>>>>>> use cmpxchgl rather than cmpxchg
>>>>>>>>> i'll whip up a strawman patch on head that can be cherrypicked /
>>>>>>>>> tested out by ryan et al
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald <
>>>>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hey Ryan,
>>>>>>>>>> looking at this closely
>>>>>>>>>> Why isn't CAS using CMPXCHG8B on 64bit architectures? Could that
>>>>>>>>>> be the culprit?
>>>>>>>>>>
>>>>>>>>>> Could the issue be that we've not had a good stress test that
>>>>>>>>>> would create values that are equal on the 32bit range, but differ on the
>>>>>>>>>> 64bit range, and you're hitting that?
>>>>>>>>>>
>>>>>>>>>> Could you try seeing if doing that change fixes things up?
>>>>>>>>>> (I may be completely wrong, but just throwing this out as a naive
>>>>>>>>>> "obvious" guess)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> Then again... I'm having trouble seeing how the spec on page
>>>>>>>>>>> 3-149 of the Intel manual would allow the behavior I'm seeing:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>>>>>>>>>>
>>>>>>>>>>> Nevertheless, this is exactly the behavior we're seeing with the
>>>>>>>>>>> current Haskell primops. Two threads simultaneously performing the same
>>>>>>>>>>> CAS(p,a,b) can both think that they succeeded.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <rrnewton at gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I commented on the commit here:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>>>>>>>>>>>
>>>>>>>>>>>> The problem is that our "cas" routine in SMP.h is similar to
>>>>>>>>>>>> the C compiler intrinsic __sync_val_compare_and_swap, in that it returns
>>>>>>>>>>>> the old value. But it seems we cannot use a comparison against that old
>>>>>>>>>>>> value to determine whether or not the CAS succeeded. (I believe the CAS
>>>>>>>>>>>> may fail due to contention, but the old value may happen to look like our
>>>>>>>>>>>> old value.)
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, this didn't occur to me until it started causing
>>>>>>>>>>>> bugs [1] [2]. Fixing casMutVar# fixes these bugs. However, the way I'm
>>>>>>>>>>>> currently fixing CAS in the "atomic-primops" package is by using
>>>>>>>>>>>> __sync_bool_compare_and_swap:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>>>>>>>>>>>
>>>>>>>>>>>> What is the best fix for GHC itself? Would it be ok for GHC
>>>>>>>>>>>> to include a C compiler intrinsic like __sync_val_compare_and_swap?
>>>>>>>>>>>> Otherwise we need another big ifdbef'd function like "cas" in SMP.h that
>>>>>>>>>>>> has the architecture-specific inline asm across all architectures. I can
>>>>>>>>>>>> write the x86 one, but I'm not eager to try the others.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> -Ryan
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://github.com/iu-parfunc/lvars/issues/70
>>>>>>>>>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> ghc-devs mailing list
>>>>>>>>>>> ghc-devs at haskell.org
>>>>>>>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/f3a0af14/attachment.html>
More information about the ghc-devs
mailing list