[GHC] #15834: genSym is not thread safe with respect to setNumCapabilities
GHC
ghc-devs at haskell.org
Tue Oct 30 10:35:03 UTC 2018
#15834: genSym is not thread safe with respect to setNumCapabilities
----------------------------------------+---------------------------------
Reporter: NeilMitchell | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.6.1
Keywords: | Operating System: Linux
Architecture: Unknown/Multiple | Type of failure: None/Unknown
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
----------------------------------------+---------------------------------
In a large proprietary application using the GHC API, we observe really
weird errors (e.g. overlapping instances for {{{Eq Foo}}} and {{{Eq
Bar}}}, where {{{Foo}}} and {{{Bar}}} are completely unrelated, and come
from different modules). The pattern we follow is:
* Running with the threaded RTS, 1 initial thread
* Create a new unique supply with {{{mkSplitUniqSupply}}} and put it in an
{{{MVar}}}.
* Repeating many times:
* Set the thread count higher (e.g. 8) using {{{setNumCapabilities}}}
* On many threads in parallel:
* Obtain a new unique supply on the original with
{{{splitUniqSupply}}}, protected by the {{{MVar}}}, and update the other
one in the {{{MVar}}}
* Use that unique supply to interact with the GHC API
* Set the thread count back to 1
Our observations of the errors are best explained by the unique names not
being nearly as unique as they might be expected to be. Reading the code
for {{{genSym}}}:
{{{#!c
if (n_capabilities == 1)
{
GenSymCounter = (GenSymCounter + GenSymInc) & UNIQUE_MASK;
checkUniqueRange(GenSymCounter);
return GenSymCounter;
}
else
{
HsInt n = atomic_inc((StgWord *)&GenSymCounter, GenSymInc) &
UNIQUE_MASK;
checkUniqueRange(n);
return n;
}
}}}
It only does an {{{atomic_inc}}} if {{{n_capabilities == 1}}}, but it
doesn't read {{{n_capabilities}}} atomically, so is it suffering a race?
The solution was to set the thread count initially, before any
interactions with the GHC API, which seems to solve the problem. Alas, we
don't have a reproducible test case, and in fact were unable to reproduce
it anywhere but our Linux CI, and even then non-deterministically. The
problem does not currently impact us (the workaround is robust), but it
seemed worth sharing.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15834>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list