setnumcapabilities001 failure

Simon Marlow marlowsd at gmail.com
Fri Oct 28 13:10:02 UTC 2016


I see, but the compiler has no business caching things across
requestSync(), which can in principle change anything: even if the compiler
could see all the code, it would find a pthread_condwait() in there.

Anyway I've found the problem - it was caused by a subsequent GC
overwriting the values of gc_threads[].idle before the previous GC had
finished releaseGCThreads() which reads those values.  Diff on the way...

Cheers
Simon

On 28 October 2016 at 11:58, Ryan Yates <fryguybob at gmail.com> wrote:

> Right, it is compiler effects at this boundary that I'm worried about,
> values that are not read from memory after the changes have been made, not
> memory effects or data races.
>
> On Fri, Oct 28, 2016 at 3:02 AM, Simon Marlow <marlowsd at gmail.com> wrote:
>
>> Hi Ryan, I don't think that's the issue.  Those variables can only be
>> modified in setNumCapabilities, which acquires *all* the capabilities
>> before it makes any changes.  There should be no other threads running RTS
>> code(*) while we change the number of capabilities.  In particular we
>> shouldn't be in releaseGCThreads while enabled_capabilities is being
>> changed.
>>
>> (*) well except for the parts at the boundary with the external world
>> which run without a capability, such as rts_lock() which acquires a
>> capability.
>>
>> Cheers
>> Simon
>>
>> On 27 Oct 2016 17:10, "Ryan Yates" <fryguybob at gmail.com> wrote:
>>
>>> Briefly looking at the code it seems like several global variables
>>> involved should be volatile: n_capabilities, enabled_capabilities, and
>>> capabilities.  Perhaps in a loop like in scheduleDoGC the compiler moves
>>> the reads of n_capabilites or capabilites outside the loop.  A failed
>>> requestSync in that loop would not get updated values for those global
>>> pointers.  That particular loop isn't doing that optimization for me, but I
>>> think it could happen without volatile.
>>>
>>> Ryan
>>>
>>> On Thu, Oct 27, 2016 at 9:18 AM, Ben Gamari <ben at smart-cactus.org>
>>> wrote:
>>>
>>>> Simon Marlow <marlowsd at gmail.com> writes:
>>>>
>>>> > I haven't been able to reproduce the failure yet. :(
>>>> >
>>>> Indeed I've also not seen it in my own local builds. It's quite an
>>>> fragile failure.
>>>>
>>>> Cheers,
>>>>
>>>> - Ben
>>>>
>>>>
>>>> _______________________________________________
>>>> ghc-devs mailing list
>>>> ghc-devs at haskell.org
>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20161028/972f4526/attachment.html>


More information about the ghc-devs mailing list