Bad news: apparent bug in casMutVar going back to 7.2

Carter Schonwald carter.schonwald at gmail.com
Sat Feb 1 07:55:09 UTC 2014


Ryan, is your benchmark using CAS on pointers, or immediate words? trying
to get atomic primops to build on my 7.8 build on my mac


On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald <carter.schonwald at gmail.com
> wrote:

> https://ghc.haskell.org/trac/ghc/ticket/8724#ticket is the ticket
>
> when i'm more awake i'll experiment some more
>
>
> On Sat, Feb 1, 2014 at 2:33 AM, Carter Schonwald <
> carter.schonwald at gmail.com> wrote:
>
>> i have a ticket for tracking this, though i'm thinking my initial attempt
>> at a patch generates the same object code as it did before.
>>
>> @ryan, what CPU variant are you testing this on? is this on a NUMA
>> machine or something?
>>
>>
>> On Sat, Feb 1, 2014 at 1:58 AM, Carter Schonwald <
>> carter.schonwald at gmail.com> wrote:
>>
>>> woops, i mean cmpxchgq
>>>
>>>
>>> On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald <
>>> carter.schonwald at gmail.com> wrote:
>>>
>>>> ok, i can confirm that on my 64bit mac, both clang and gcc use cmpxchgl
>>>> rather than cmpxchg
>>>> i'll whip up a strawman patch on head that can be cherrypicked / tested
>>>> out by ryan et al
>>>>
>>>>
>>>> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald <
>>>> carter.schonwald at gmail.com> wrote:
>>>>
>>>>> Hey Ryan,
>>>>> looking at this closely
>>>>> Why isn't CAS using CMPXCHG8B on 64bit architectures?  Could that be
>>>>> the culprit?
>>>>>
>>>>> Could the issue be that we've not had a good stress test that would
>>>>> create values that are equal on the 32bit range, but differ on the 64bit
>>>>> range, and you're hitting that?
>>>>>
>>>>> Could you try seeing if doing that change fixes things up?
>>>>> (I may be completely wrong, but just throwing this out as a naive
>>>>> "obvious" guess)
>>>>>
>>>>>
>>>>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>
>>>>>> Then again... I'm having trouble seeing how the spec on page 3-149 of
>>>>>> the Intel manual would allow the behavior I'm seeing:
>>>>>>
>>>>>>
>>>>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>>>>>
>>>>>> Nevertheless, this is exactly the behavior we're seeing with the
>>>>>> current Haskell primops.  Two threads simultaneously performing the same
>>>>>> CAS(p,a,b) can both think that they succeeded.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>>
>>>>>>> I commented on the commit here:
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>>>>>>
>>>>>>> The problem is that our "cas" routine in SMP.h is similar to the C
>>>>>>> compiler intrinsic __sync_val_compare_and_swap, in that it returns the old
>>>>>>> value.  But it seems we cannot use a comparison against that old value to
>>>>>>> determine whether or not the CAS succeeded.  (I believe the CAS may fail
>>>>>>> due to contention, but the old value may happen to look like our old value.)
>>>>>>>
>>>>>>> Unfortunately, this didn't occur to me until it started causing bugs
>>>>>>> [1] [2].  Fixing casMutVar# fixes these bugs.  However, the way I'm
>>>>>>> currently fixing CAS in the "atomic-primops" package is by using
>>>>>>> __sync_bool_compare_and_swap:
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>>>>>>
>>>>>>> What is the best fix for GHC itself?   Would it be ok for GHC to
>>>>>>> include a C compiler intrinsic like __sync_val_compare_and_swap?  Otherwise
>>>>>>> we need another big ifdbef'd function like "cas" in SMP.h that has the
>>>>>>> architecture-specific inline asm across all architectures.  I can write the
>>>>>>> x86 one, but I'm not eager to try the others.
>>>>>>>
>>>>>>> Best,
>>>>>>>    -Ryan
>>>>>>>
>>>>>>> [1] https://github.com/iu-parfunc/lvars/issues/70
>>>>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ghc-devs mailing list
>>>>>> ghc-devs at haskell.org
>>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/72626f90/attachment-0001.html>


More information about the ghc-devs mailing list