[Git][ghc/ghc][wip/tsan-ghc-8.10] 81 commits: 8.10 - dirty MVAR after mutating TSO queue head

Ben Gamari gitlab at gitlab.haskell.org
Tue Dec 1 00:18:03 UTC 2020



Ben Gamari pushed to branch wip/tsan-ghc-8.10 at Glasgow Haskell Compiler / GHC


Commits:
d0a18f89 by Viktor Dukhovni at 2020-11-29T21:53:30-05:00
8.10 - dirty MVAR after mutating TSO queue head

While the original head and tail of the TSO queue may be in the same
generation as the MVAR, interior elements of the queue could be younger
after a GC run and may then be exposed by putMVar operation that updates
the queue head.

Resolves #18919

(cherry picked from commit 699facec0bc8dd7d5b82cc537fbf131b74f5bd2c)

- - - - -
92c9bed7 by Ömer Sinan Ağacan at 2020-11-29T21:53:30-05:00
Fix and enable object unloading in GHCi

Fixes #16525 by tracking dependencies between object file symbols and
marking symbol liveness during garbage collection

See Note [Object unloading] in CheckUnload.c for details.

(cherry picked from commit c34a4b98b1f09ea3096d39a839a86f2d7185c796)

- - - - -
2d72607e by Ben Gamari at 2020-11-29T21:53:30-05:00
gitlab-ci: Add VERBOSE environment variable

And change the make build system's default behavior to V=0, greatly
reducing build log sizes.

(cherry picked from commit 802e9180dd9a9a88c4e8869f0de1048e1edd6343)

- - - - -
b7a1e016 by Ben Gamari at 2020-11-30T19:16:58-05:00
SMP.h: Add C11-style atomic operations

- - - - -
023e414e by Ben Gamari at 2020-11-30T19:16:58-05:00
rts: Infrastructure for testing with ThreadSanitizer

- - - - -
631e1b83 by Ben Gamari at 2020-11-30T19:16:58-05:00
rts/CNF: Initialize all bdescrs in group

It seems wise and cheap to ensure that the whole bdescr of all blocks of
a compact group is valid, even if most cases only look at the flags
field.

- - - - -
111afe5a by Ben Gamari at 2020-11-30T19:16:58-05:00
rts/Capability: Intialize interrupt field

Previously this was left uninitialized.

Also clarify some comments.

- - - - -
109affa5 by Ben Gamari at 2020-11-30T19:16:58-05:00
rts/Task: Make comments proper Notes

- - - - -
b67647ba by Ben Gamari at 2020-11-30T19:16:59-05:00
rts/SpinLock: Move to proper atomics

This is fairly straightforward; we just needed to use relaxed operations
for the PROF_SPIN counters and a release store instead of a write
barrier.

- - - - -
b69a1ebb by Ben Gamari at 2020-11-30T19:16:59-05:00
rts/OSThreads: Fix data race

Previously we would race on the cached processor count. Avoiding this is
straightforward; just use relaxed operations.

- - - - -
a20a219e by Ben Gamari at 2020-11-30T19:16:59-05:00
rts/ClosureMaros: Use relaxed atomics

- - - - -
73d31171 by Ben Gamari at 2020-11-30T19:16:59-05:00
testsuite: Fix thread leak in hs_try_putmvar00[13]

- - - - -
efec08d1 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts: Introduce SET_HDR_RELEASE

Also ensure that we also store the info table pointer last to ensure
that the synchronization covers all stores.

- - - - -
1a16a637 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts: Factor out logic to identify a good capability for running a task

Not only does this make the control flow a bit clearer but it also
allows us to add a TSAN suppression on this logic, which requires
(harmless) data races.

- - - - -
34e8fc56 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts: Annotate benign race in waitForCapability

- - - - -
7f59fb10 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts: Clarify locking behavior of releaseCapability_

- - - - -
2497d63e by Ben Gamari at 2020-11-30T19:16:59-05:00
rts: Add assertions for task ownership of capabilities

- - - - -
32b7a3f7 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts: Use relaxed atomics on n_returning_tasks

This mitigates the warning of a benign race on n_returning_tasks in
shouldYieldCapability.

See #17261.

- - - - -
18c60ee8 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts: Mitigate races in capability interruption logic

- - - - -
b87ffbe7 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts/Capability: Use relaxed operations for last_free_capability

- - - - -
a5e0c4e7 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts: Use relaxed operations for cap->running_task (TODO)

This shouldn't be necessary since only the owning thread of the capability
should be touching this.

- - - - -
d462b091 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts/Schedule: Use relaxed operations for sched_state

- - - - -
be4440e6 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts: Accept data race in work-stealing implementation

This race is okay since the task is owned by the capability pushing it.
By Note [Ownership of Task] this means that the capability is free to
write to `task->cap` without taking `task->lock`.

Fixes #17276.

- - - - -
7a4a2215 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts: Eliminate data races on pending_sync

- - - - -
6ca008dd by Ben Gamari at 2020-11-30T19:16:59-05:00
rts/Schedule: Eliminate data races on recent_activity

We cannot safely use relaxed atomics here.

- - - - -
39634509 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts: Avoid data races in message handling

- - - - -
70f51d2b by Ben Gamari at 2020-11-30T19:16:59-05:00
rts/Messages: Drop incredibly fishy write barrier

executeMessage previously had a write barrier at the beginning of its
loop apparently in an attempt to synchronize with another thread's
writes to the Message. I would guess that the author had intended to use
a load barrier here given that there are no globally-visible writes done
in executeMessage.

I've removed the redundant barrier since the necessary load barrier is
now provided by the ACQUIRE_LOAD.

- - - - -
03301a27 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts/ThreadPaused: Avoid data races

- - - - -
f3a3abf8 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts/Schedule: Eliminate data races in run queue management

- - - - -
3d7a9776 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts: Eliminate shutdown data race on task counters

- - - - -
14a7df42 by Ben Gamari at 2020-11-30T19:16:59-05:00
rts/Threads: Avoid data races (TODO)

Replace barriers with appropriate ordering. Drop redundant barrier in
tryWakeupThread (the RELEASE barrier will be provided by sendMessage's
mutex release).

We use relaxed operations on why_blocked and the stack although it's not
clear to me why this is necessary.

- - - - -
064625cb by Ben Gamari at 2020-11-30T19:16:59-05:00
rts/Messages: Annotate benign race

- - - - -
4149887d by Ben Gamari at 2020-11-30T19:17:00-05:00
rts/RaiseAsync: Synchronize what_next read

- - - - -
4b187707 by Ben Gamari at 2020-11-30T19:17:00-05:00
rts/Task: Move debugTrace to avoid data race

Specifically, we need to hold all_tasks_mutex to read taskCount.

- - - - -
cf52442a by Ben Gamari at 2020-11-30T19:17:00-05:00
Disable flawed assertion

- - - - -
115cb724 by Ben Gamari at 2020-11-30T19:17:00-05:00
Document schedulePushWork race

- - - - -
9d53cff8 by Ben Gamari at 2020-11-30T19:17:00-05:00
Capabiliity: Properly fix data race on n_returning_tasks

There is a real data race but can be made safe by using proper atomic
(but relaxed) accesses.

- - - - -
c0947071 by Ben Gamari at 2020-11-30T19:17:00-05:00
rts: Make write of to_cap->inbox atomic

This is necessary since emptyInbox may read from to_cap->inbox without
taking cap->lock.

- - - - -
0b41ab72 by Ben Gamari at 2020-11-30T19:17:00-05:00
gitlab-ci: Add nightly-x86_64-linux-deb9-tsan job

- - - - -
9076f5c4 by GHC GitLab CI at 2020-11-30T19:17:00-05:00
testsuite: Mark setnumcapabilities001 as broken with TSAN

Due to #18808.

- - - - -
fe840ece by GHC GitLab CI at 2020-11-30T19:17:00-05:00
testsuite: Skip divbyzero and derefnull under TSAN

ThreadSanitizer changes the output of these tests.

- - - - -
5650808a by Ben Gamari at 2020-11-30T19:17:00-05:00
testsuite: Skip high memory usage tests with TSAN

ThreadSanitizer significantly increases the memory footprint of tests,
so much so that it can send machines into OOM.

- - - - -
a07163dd by Ben Gamari at 2020-11-30T19:17:00-05:00
testsuite: Mark hie002 as high_memory_usage

This test has a peak residency of 1GByte; this is large enough to
classify as "high" in my book.

- - - - -
04b2d5a7 by Ben Gamari at 2020-11-30T19:17:00-05:00
testsuite: Mark T9872[abc] as high_memory_usage

These all have a maximum residency of over 2 GB.

- - - - -
c8c0c2dc by Ben Gamari at 2020-11-30T19:17:00-05:00
gitlab-ci: Disable documentation in TSAN build

Haddock chews through enough memory to cause the CI builders to OOM and
there's frankly no reason to build documentation in this job anyways.

- - - - -
e420db52 by Ben Gamari at 2020-11-30T19:17:00-05:00
TSANUtils: Ensure that C11 atomics are supported

- - - - -
823408fd by Ben Gamari at 2020-11-30T19:17:00-05:00
testsuite: Mark T3807 as broken with TSAN

Due to #18883.

- - - - -
54e7da1c by Ben Gamari at 2020-11-30T19:17:00-05:00
testsuite: Mark T13702 as broken with TSAN due to #18884

- - - - -
1e9e797c by Ben Gamari at 2020-11-30T19:17:00-05:00
rts/BlockAlloc: Use relaxed operations

- - - - -
2333e636 by Ben Gamari at 2020-11-30T19:17:00-05:00
rts: Rework handling of mutlist scavenging statistics

- - - - -
2e7699fa by Ben Gamari at 2020-11-30T19:17:00-05:00
rts: Avoid data races in StablePtr implementation

This fixes two potentially problematic data races in the StablePtr
implementation:

 * We would fail to RELEASE the stable pointer table when enlarging it,
   causing other cores to potentially see uninitialized memory.

 * We would fail to ACQUIRE when dereferencing a stable pointer.

- - - - -
73802c70 by Ben Gamari at 2020-11-30T19:17:00-05:00
rts/Storage: Use atomics

- - - - -
971af95f by Ben Gamari at 2020-11-30T19:17:00-05:00
rts/Updates: Use proper atomic operations

- - - - -
57697340 by Ben Gamari at 2020-11-30T19:17:00-05:00
rts/Weak: Eliminate data races

By taking all_tasks_mutex in stat_exit. Also better-document the fact
that the task statistics are protected by all_tasks_mutex.

- - - - -
d7567dea by Ben Gamari at 2020-11-30T19:17:01-05:00
rts/GC: Use atomics

- - - - -
d0d61a61 by Ben Gamari at 2020-11-30T19:17:01-05:00
rts: Use RELEASE ordering in unlockClosure

- - - - -
43b55267 by Ben Gamari at 2020-11-30T19:17:01-05:00
rts/Storage: Accept races on heap size counters

- - - - -
e3d2418d by Ben Gamari at 2020-11-30T19:17:01-05:00
rts: Join to concurrent mark thread during shutdown

Previously we would take all capabilities but fail to join on the thread
itself, potentially resulting in a leaked thread.

- - - - -
0da89a35 by GHC GitLab CI at 2020-11-30T19:17:01-05:00
rts: Fix race in GC CPU time accounting

Ensure that the GC leader synchronizes with workers before calling
stat_endGC.

- - - - -
0a55e0db by Ben Gamari at 2020-11-30T19:17:01-05:00
rts/SpinLock: Separate out slow path

Not only is this in general a good idea, but it turns out that GCC
unrolls the retry loop, resulting is massive code bloat in critical
parts of the RTS (e.g. `evacuate`).

- - - - -
2aae02dd by Ben Gamari at 2020-11-30T19:17:01-05:00
rts: Use relaxed ordering on spinlock counters

- - - - -
c2527bcd by Ben Gamari at 2020-11-30T19:17:01-05:00
rts: Annotate hopefully "benign" races in freeGroup

- - - - -
d2649d6e by Ben Gamari at 2020-11-30T19:17:01-05:00
Strengthen ordering in releaseGCThreads

- - - - -
854884e1 by Ben Gamari at 2020-11-30T19:17:01-05:00
rts/WSDeque: Rewrite with proper atomics

After a few attempts at shoring up the previous implementation, I ended
up turning to the literature and now use the proven implementation,

> N.M. Lê, A. Pop, A.Cohen, and F.Z. Nardelli. "Correct and Efficient
> Work-Stealing for Weak Memory Models". PPoPP'13, February 2013,
> ACM 978-1-4503-1922/13/02.

Note only is this approach formally proven correct under C11 semantics
but it is also proved to be a bit faster in practice.

- - - - -
405c4e04 by Ben Gamari at 2020-11-30T19:17:01-05:00
rts: Use relaxed atomics for whitehole spin stats

- - - - -
d6f250fd by Ben Gamari at 2020-11-30T19:17:01-05:00
rts: Avoid lock order inversion during fork

Fixes #17275.

- - - - -
2fff334f by GHC GitLab CI at 2020-11-30T19:17:01-05:00
rts: Use proper relaxe operations in getCurrentThreadCPUTime

Here we are doing lazy initialization; it's okay if we do the check more
than once, hence relaxed operation is fine.

- - - - -
9e58ed8a by Ben Gamari at 2020-11-30T19:17:01-05:00
rts/STM: Use atomics

This fixes a potentially harmful race where we failed to synchronize
before looking at a TVar's current_value.

Also did a bit of refactoring to avoid abstract over management of
max_commits.

- - - - -
c0fe81b1 by Ben Gamari at 2020-11-30T19:17:01-05:00
rts/stm: Strengthen orderings to SEQ_CST instead of volatile

Previously the `current_value`, `first_watch_queue_entry`, and
`num_updates` fields of `StgTVar` were marked as `volatile` in an
attempt to provide strong ordering. Of course, this isn't sufficient.

We now use proper atomic operations. In most of these cases I strengthen
the ordering all the way to SEQ_CST although it's possible that some
could be weakened with some thought.

- - - - -
fb378955 by Ben Gamari at 2020-11-30T19:17:01-05:00
Mitigate data races in event manager startup/shutdown

- - - - -
7852d6b8 by Ben Gamari at 2020-11-30T19:17:01-05:00
Suppress data race due to close

This suppresses the other side of a race during shutdown.

- - - - -
377d742a by Ben Gamari at 2020-11-30T19:17:01-05:00
rts: Accept benign races in Proftimer

- - - - -
2d099438 by Ben Gamari at 2020-11-30T19:17:01-05:00
rts: Pause timer while changing capability count

This avoids #17289.

- - - - -
6e48304c by Ben Gamari at 2020-11-30T19:17:01-05:00
Fix #17289

- - - - -
759ea696 by Ben Gamari at 2020-11-30T19:17:01-05:00
suppress #17289 (ticker) race

- - - - -
44cb606b by Ben Gamari at 2020-11-30T19:17:02-05:00
rts: Fix timer initialization

Previously `initScheduler` would attempt to pause the ticker and in so
doing acquire the ticker mutex. However, initTicker, which is
responsible for initializing said mutex, hadn't been called
yet.

- - - - -
b780bf48 by Ben Gamari at 2020-11-30T19:17:02-05:00
rts: Fix races in Pthread timer backend shudown

We can generally be pretty relaxed in the barriers here since the timer
thread is a loop.

- - - - -
636bde86 by Ben Gamari at 2020-11-30T19:17:02-05:00
rts/Stats: Hide a few unused unnecessarily global functions

- - - - -
232dc488 by Ben Gamari at 2020-11-30T19:17:02-05:00
rts/Stats: Protect with mutex

While on face value this seems a bit heavy, I think it's far better than
enforcing ordering on every access.

- - - - -
40860e41 by Ben Gamari at 2020-11-30T19:17:47-05:00
rts: Tear down stats_mutex after exitHeapProfiling

Since the latter wants to call getRTSStats.

- - - - -
383bb46f by Ben Gamari at 2020-11-30T19:17:49-05:00
rts/Stats: Reintroduce mut_user_time

Fix the previous backport; this function was dead code in master but is
still needed due to ProfHeap.c in ghc-8.10.

- - - - -


30 changed files:

- .gitlab-ci.yml
- .gitlab/ci.sh
- compiler/ghci/Linker.hs
- hadrian/hadrian.cabal
- hadrian/src/Flavour.hs
- hadrian/src/Settings.hs
- + hadrian/src/Settings/Flavours/ThreadSanitizer.hs
- includes/Rts.h
- includes/rts/OSThreads.h
- includes/rts/SpinLock.h
- includes/rts/StablePtr.h
- + includes/rts/TSANUtils.h
- includes/rts/storage/ClosureMacros.h
- includes/rts/storage/Closures.h
- includes/rts/storage/GC.h
- includes/stg/SMP.h
- libraries/base/GHC/Event/Control.hs
- + rts/.tsan-suppressions
- rts/Capability.c
- rts/Capability.h
- rts/CheckUnload.c
- rts/CheckUnload.h
- rts/Hash.c
- rts/Hash.h
- rts/Linker.c
- rts/LinkerInternals.h
- rts/Messages.c
- rts/PrimOps.cmm
- rts/Proftimer.c
- rts/RaiseAsync.c


The diff was not included because it is too large.


View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/f6f0343bd8dc1ea5d8085ed551bcb3400b72b694...383bb46f9de2e5943ac7d5bc5722af9d36ede017

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/f6f0343bd8dc1ea5d8085ed551bcb3400b72b694...383bb46f9de2e5943ac7d5bc5722af9d36ede017
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20201130/70ecf565/attachment-0001.html>


More information about the ghc-commits mailing list