[Git][ghc/ghc][wip/tsan/fix-races] 30 commits: rts: Introduce more principled fence operations

Ben Gamari (@bgamari) gitlab at gitlab.haskell.org
Mon Jul 24 18:30:21 UTC 2023



Ben Gamari pushed to branch wip/tsan/fix-races at Glasgow Haskell Compiler / GHC


Commits:
b431aa81 by Ben Gamari at 2023-07-24T14:23:58-04:00
rts: Introduce more principled fence operations

- - - - -
8a64b390 by Ben Gamari at 2023-07-24T14:23:59-04:00
rts: Introduce SET_INFO_RELAXED

- - - - -
e6f15532 by Ben Gamari at 2023-07-24T14:24:28-04:00
codeGen/tsan: Rework handling of spilling

- - - - -
30d33118 by Ben Gamari at 2023-07-24T14:24:28-04:00
hadrian: More debug information

- - - - -
83b508dc by Ben Gamari at 2023-07-24T14:24:28-04:00
Improve TSAN documentation

- - - - -
97e29767 by Ben Gamari at 2023-07-24T14:24:28-04:00
hadrian: More selective TSAN instrumentation

- - - - -
0c1b925a by Ben Gamari at 2023-07-24T14:24:58-04:00
rts: Fix data race in threadPaused

This only affects an assertion in the debug RTS, but it's a data race
nevertheless.

- - - - -
e9398995 by Ben Gamari at 2023-07-24T14:24:58-04:00
cmm: Introduce MO_RelaxedRead

In hand-written Cmm it can sometimes be necessary to atomically load
from memory deep within an expression (e.g. see the `CHECK_GC` macro).
This MachOp provides a convenient way to do so without breaking the
expression into multiple statements.

- - - - -
f1d15c88 by Ben Gamari at 2023-07-24T14:24:58-04:00
rts: Silence spurious data races in ticky counters

Previously we would use non-atomic accesses when bumping ticky counters,
which would result in spurious data race reports from ThreadSanitizer
when the threaded RTS was in use.

- - - - -
565f2c96 by Ben Gamari at 2023-07-24T14:29:49-04:00
codeGen: Use relaxed accesses in ticky bumping

- - - - -
cfa412f9 by Ben Gamari at 2023-07-24T14:29:49-04:00
rts: Fix data race in Interpreter's preemption check

- - - - -
6b7d605f by Ben Gamari at 2023-07-24T14:29:49-04:00
rts: Fix data race in threadStatus#

- - - - -
a11ca8ee by Ben Gamari at 2023-07-24T14:29:49-04:00
rts: Fix data race in CHECK_GC

- - - - -
4fac43a6 by Ben Gamari at 2023-07-24T14:29:49-04:00
base: use atomic write when updating timer manager

- - - - -
5e22b039 by Ben Gamari at 2023-07-24T14:29:49-04:00
Use relaxed atomics to manipulate TSO status fields

- - - - -
c105d725 by Ben Gamari at 2023-07-24T14:29:49-04:00
rts: Add necessary barriers when manipulating TSO owner

- - - - -
1a3a0576 by Ben Gamari at 2023-07-24T14:29:49-04:00
rts: Fix synchronization on thread blocking state

- - - - -
1b5951a0 by Ben Gamari at 2023-07-24T14:29:49-04:00
rts: Relaxed load MutVar info table

- - - - -
19cbd620 by Ben Gamari at 2023-07-24T14:29:50-04:00
Wordsmith TSAN Note

- - - - -
3ae8b8fa by Ben Gamari at 2023-07-24T14:29:50-04:00
codeGen: Use relaxed-read in closureInfoPtr

- - - - -
2db2c8d9 by Ben Gamari at 2023-07-24T14:29:50-04:00
Fix thunk update ordering

Previously we attempted to ensure soundness of concurrent thunk update
by synchronizing on the access of the thunk's info table pointer field.
This was believed to be sufficient since the indirectee (which may
expose a closure allocated by another core) would not be examined
until the info table pointer update is complete.

However, it turns out that this can result in data races in the presence
of multiple threads racing a update a single thunk. For instance,
consider this interleaving under the old scheme:

            Thread A                             Thread B
            ---------                            ---------
    t=0     Enter t
      1     Push update frame
      2     Begin evaluation

      4     Pause thread
      5     t.indirectee=tso
      6     Release t.info=BLACKHOLE

      7     ... (e.g. GC)

      8     Resume thread
      9     Finish evaluation
      10    Relaxed t.indirectee=x

      11                                         Load t.info
      12                                         Acquire fence
      13                                         Inspect t.indirectee

      14    Release t.info=BLACKHOLE

Here Thread A enters thunk `t` but is soon paused, resulting in `t`
being lazily blackholed at t=6. Then, at t=10 Thread A finishes
evaluation and updates `t.indirectee` with a relaxed store.

Meanwhile, Thread B enters the blackhole. Under the old scheme this
would introduce an acquire-fence but this would only synchronize with
Thread A at t=6. Consequently, the result of the evaluation, `x`, is not
visible to Thread B, introducing a data race.

We fix this by treating the `indirectee` field as we do all other
mutable fields. This means we must always access this field with
acquire-loads and release-stores.

See #23185.

- - - - -
4be16b7b by Ben Gamari at 2023-07-24T14:29:50-04:00
STM: Use acquire loads when possible

Full sequential consistency is not needed here.

- - - - -
20982159 by Ben Gamari at 2023-07-24T14:29:50-04:00
rts/Interpreter: Fix data race

- - - - -
30ab9251 by Ben Gamari at 2023-07-24T14:29:50-04:00
rts/Messages: Fix data race

- - - - -
be9e5ae6 by Ben Gamari at 2023-07-24T14:29:50-04:00
rts/Prof: Fix data race

- - - - -
9698a4c6 by Ben Gamari at 2023-07-24T14:29:50-04:00
rts: Fix various data races

- - - - -
c14da691 by Ben Gamari at 2023-07-24T14:29:50-04:00
rts: Use fence rather than redundant load

- - - - -
c3b7e920 by Ben Gamari at 2023-07-24T14:29:50-04:00
Tighten up thunk update barriers

- - - - -
60876ace by Ben Gamari at 2023-07-24T14:29:50-04:00
rts/RaiseAsync: Drop redundant release fence

- - - - -
baafa60e by Ben Gamari at 2023-07-24T14:29:50-04:00
rts: Fixes profiling timer races

- - - - -


30 changed files:

- compiler/GHC/Cmm/Expr.hs
- compiler/GHC/Cmm/Info.hs
- compiler/GHC/Cmm/MachOp.hs
- compiler/GHC/Cmm/Parser.y
- compiler/GHC/Cmm/ThreadSanitizer.hs
- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/PPC/CodeGen.hs
- compiler/GHC/CmmToAsm/Wasm/FromCmm.hs
- compiler/GHC/CmmToAsm/X86/CodeGen.hs
- compiler/GHC/CmmToC.hs
- compiler/GHC/CmmToLlvm/CodeGen.hs
- compiler/GHC/StgToCmm/Bind.hs
- compiler/GHC/StgToCmm/Ticky.hs
- compiler/GHC/StgToCmm/Utils.hs
- hadrian/src/Flavour.hs
- libraries/base/GHC/Event/Thread.hs
- rts/Apply.cmm
- rts/Compact.cmm
- rts/Exception.cmm
- rts/Heap.c
- rts/HeapStackCheck.cmm
- rts/Interpreter.c
- rts/Messages.c
- rts/PrimOps.cmm
- rts/Proftimer.c
- rts/RaiseAsync.c
- rts/STM.c
- rts/Schedule.c
- rts/StableName.c
- rts/StgMiscClosures.cmm


The diff was not included because it is too large.


View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/690c0791fd85de6049dc845a10f2231de4a71661...baafa60e240b05a2a225ec0b730447db8837a1a4

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/690c0791fd85de6049dc845a10f2231de4a71661...baafa60e240b05a2a225ec0b730447db8837a1a4
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230724/cbb9ff81/attachment-0001.html>


More information about the ghc-commits mailing list