setnumcapabilities001 failure

Simon Marlow marlowsd at
Fri Oct 28 13:10:02 UTC 2016

I see, but the compiler has no business caching things across
requestSync(), which can in principle change anything: even if the compiler
could see all the code, it would find a pthread_condwait() in there.

Anyway I've found the problem - it was caused by a subsequent GC
overwriting the values of gc_threads[].idle before the previous GC had
finished releaseGCThreads() which reads those values.  Diff on the way...


On 28 October 2016 at 11:58, Ryan Yates <fryguybob at> wrote:

> Right, it is compiler effects at this boundary that I'm worried about,
> values that are not read from memory after the changes have been made, not
> memory effects or data races.
> On Fri, Oct 28, 2016 at 3:02 AM, Simon Marlow <marlowsd at> wrote:
>> Hi Ryan, I don't think that's the issue.  Those variables can only be
>> modified in setNumCapabilities, which acquires *all* the capabilities
>> before it makes any changes.  There should be no other threads running RTS
>> code(*) while we change the number of capabilities.  In particular we
>> shouldn't be in releaseGCThreads while enabled_capabilities is being
>> changed.
>> (*) well except for the parts at the boundary with the external world
>> which run without a capability, such as rts_lock() which acquires a
>> capability.
>> Cheers
>> Simon
>> On 27 Oct 2016 17:10, "Ryan Yates" <fryguybob at> wrote:
>>> Briefly looking at the code it seems like several global variables
>>> involved should be volatile: n_capabilities, enabled_capabilities, and
>>> capabilities.  Perhaps in a loop like in scheduleDoGC the compiler moves
>>> the reads of n_capabilites or capabilites outside the loop.  A failed
>>> requestSync in that loop would not get updated values for those global
>>> pointers.  That particular loop isn't doing that optimization for me, but I
>>> think it could happen without volatile.
>>> Ryan
>>> On Thu, Oct 27, 2016 at 9:18 AM, Ben Gamari <ben at>
>>> wrote:
>>>> Simon Marlow <marlowsd at> writes:
>>>> > I haven't been able to reproduce the failure yet. :(
>>>> >
>>>> Indeed I've also not seen it in my own local builds. It's quite an
>>>> fragile failure.
>>>> Cheers,
>>>> - Ben
>>>> _______________________________________________
>>>> ghc-devs mailing list
>>>> ghc-devs at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the ghc-devs mailing list