[Git][ghc/ghc][wip/9.2.6-backports] 16 commits: nonmoving: Fix handling of weak pointers

Zubin (@wz1000) gitlab at gitlab.haskell.org
Wed Feb 8 13:36:06 UTC 2023



Zubin pushed to branch wip/9.2.6-backports at Glasgow Haskell Compiler / GHC


Commits:
d377b9b4 by Ben Gamari at 2023-02-08T19:05:11+05:30
nonmoving: Fix handling of weak pointers

This fixes an interaction between aging and weak pointer handling which
prevented the finalization of some weak pointers. In particular, weak
pointers could have their keys incorrectly marked by the preparatory
collector, preventing their finalization by the subsequent concurrent
collection.

While in the area, we also significantly improve the assertions
regarding weak pointers.

Fixes #22327.

(cherry picked from commit a999f7b5b4b7316c088d7233a452fb33dc17646f)

- - - - -
f463d9e4 by Ben Gamari at 2023-02-08T19:05:11+05:30
nonmoving: Fix tracking of FILLED_SWEEPING segments

Previously we only updated the state of the segment at the head of each
allocator's filled list.

(cherry picked from commit 48b89dba91640fa977d038ea5283019c73f1b18e)

- - - - -
c7f6cc07 by Ben Gamari at 2023-02-08T19:05:11+05:30
nonmoving: Handle new closures in nonmovingIsNowAlive

(cherry picked from commit 36ca160d0f199a688cf5fbc91d4bb92d2d4ea14e)

- - - - -
e2a81e2a by Ben Gamari at 2023-02-08T19:05:11+05:30
nonmoving: Don't clobber update rem sets of old capabilities

Previously `storageAddCapabilities` (called by `setNumCapabilities`) would
clobber the update remembered sets of existing capabilities when
increasing the capability count. Fix this by only initializing the
update remembered sets of the newly-created capabilities.

(cherry picked from commit 8b64aff0fa978c762dfae8df235dd2b2a340656a)

- - - - -
8aceb849 by Ben Gamari at 2023-02-08T19:05:11+05:30
nonmoving: Add missing write barriers in selector optimisation

This fixes the selector optimisation, adding a few write barriers which
are necessary for soundness.

(cherry picked from commit dde67d6e32ecff0e400f98213d42ae790babac09)

- - - - -
12761253 by Ben Gamari at 2023-02-08T19:05:11+05:30
nonmoving: Don't push if nonmoving collector isn't enabled

(cherry picked from commit 8adc1750c02e596b4014d2837b4eb3d76bd130f2)

- - - - -
41c8db24 by Ben Gamari at 2023-02-08T19:05:11+05:30
nonmoving: Sync-phase mark budgeting

Here we significantly improve the bound on sync phase pause times by
imposing a limit on the amount of work that we can perform during the
sync. If we find that we have exceeded our marking budget then we allow
the mutators to resume, return to concurrent marking, and try
synchronizing again later.

(cherry picked from commit c620669651c52fd228af61040747dd7236c4ba2b)

- - - - -
cf1921ab by Ben Gamari at 2023-02-08T19:05:11+05:30
nonmoving: Allow pinned gen0 objects to be WEAK keys

(cherry picked from commit 3887132fcd20f0a1edfc9f7006bdbed2634e2a8d)

- - - - -
639329fe by Ben Gamari at 2023-02-08T19:05:11+05:30
nonmoving: Be more paranoid in segment tracking

Previously we left various segment link pointers dangling. None of this
wrong per se, but it did make it harder than necessary to debug.

(cherry picked from commit cc75031b4d93d4565e0428cb5910d9b9a645485b)

- - - - -
764145d7 by Ben Gamari at 2023-02-08T19:05:11+05:30
nonmoving: Add missing no-op in busy-wait loop

(cherry picked from commit 9f931a8801c84b8ae473f91349e144eebc73b415)

- - - - -
71adc788 by Ben Gamari at 2023-02-08T19:05:11+05:30
nonmoving: Move current segment array into Capability

(cherry picked from commit 9d245c1baec91ee79d715062b127e487456d9c9e)

- - - - -
85a080e9 by Ben Gamari at 2023-02-08T19:05:11+05:30
nonmoving: Don't call prepareUnloadCheck

When the nonmoving GC is in use we do not call `checkUnload` (since we
don't unload code) and therefore should not call `prepareUnloadCheck`,
lest we run into assertions.

(cherry picked from commit 6bdce35cdd59112a8cb4a4a3b061e854ada3ff63)

- - - - -
5721baa1 by Ben Gamari at 2023-02-08T19:05:11+05:30
nonmoving: Fix unregisterised build

(cherry picked from commit 6df2709e7215ea80d7267800e0318aee2a7c277f)

- - - - -
98802ef8 by Ben Gamari at 2023-02-08T19:05:11+05:30
nonmoving: Avoid n_caps race

(cherry picked from commit c00d6de815d4e125c1c4d8ff06549042f502f759)

- - - - -
d5291cff by Ben Gamari at 2023-02-08T19:05:11+05:30
relnotes: Mention various non-moving GC fixes

- - - - -
06a4a65f by Zubin Duggal at 2023-02-08T19:05:11+05:30
testsuite: Mark T9405 as fixed on windows

- - - - -


17 changed files:

- docs/users_guide/9.2.6-notes.rst
- rts/Capability.c
- rts/Capability.h
- rts/RtsStartup.c
- rts/Schedule.c
- rts/sm/Evac.c
- rts/sm/GC.c
- rts/sm/MarkWeak.c
- rts/sm/NonMoving.c
- rts/sm/NonMoving.h
- rts/sm/NonMovingCensus.c
- rts/sm/NonMovingCensus.h
- rts/sm/NonMovingMark.c
- rts/sm/NonMovingMark.h
- rts/sm/Sanity.c
- rts/sm/Storage.c
- testsuite/tests/rts/all.T


Changes:

=====================================
docs/users_guide/9.2.6-notes.rst
=====================================
@@ -66,6 +66,23 @@ Runtime system
 
 - Truncate eventlog events with a large payload (:ghc-ticket:`20221`).
 
+- A bug in the nonmoving garbage collector regarding the treatment of
+  zero-length ``SmallArray#``\ s has been fixed (:ghc-ticket:`22264`)
+
+- A number of bugs regarding the non-moving garbage collector's treatment of
+  ``Weak#`` pointers have been fixed (:ghc-ticket:`22327`)
+
+- A few race conditions between the non-moving collector and
+  ``setNumCapabilities`` which could result in undefined behavior have been
+  fixed (:ghc-ticket:`22926`, :ghc-ticket:`22927`)
+
+- The non-moving collector is now able to better schedule marking work during
+  the post-mark synchronization phase of collection, significantly reducing
+  pause times in some workloads (:ghc-ticket:`22929`).
+
+- Various bugs in the non-moving collector's implementation of the selector
+  optimisation have been fixed (:ghc-ticket:`22930`)
+
 Build system and packaging
 --------------------------
 


=====================================
rts/Capability.c
=====================================
@@ -294,6 +294,7 @@ initCapability (Capability *cap, uint32_t i)
     cap->saved_mut_lists = stgMallocBytes(sizeof(bdescr *) *
                                           RtsFlags.GcFlags.generations,
                                           "initCapability");
+    cap->current_segments = NULL;
 
 
     // At this point storage manager is not initialized yet, so this will be
@@ -1267,6 +1268,9 @@ freeCapability (Capability *cap)
 {
     stgFree(cap->mut_lists);
     stgFree(cap->saved_mut_lists);
+    if (cap->current_segments) {
+        stgFree(cap->current_segments);
+    }
 #if defined(THREADED_RTS)
     freeSparkPool(cap->sparks);
 #endif


=====================================
rts/Capability.h
=====================================
@@ -88,6 +88,9 @@ struct Capability_ {
 
     // The update remembered set for the non-moving collector
     UpdRemSet upd_rem_set;
+    // Array of current segments for the non-moving collector.
+    // Of length NONMOVING_ALLOCA_CNT.
+    struct NonmovingSegment **current_segments;
 
     // block for allocating pinned objects into
     bdescr *pinned_object_block;


=====================================
rts/RtsStartup.c
=====================================
@@ -472,6 +472,7 @@ hs_exit_(bool wait_foreign)
     for (g = 0; g < RtsFlags.GcFlags.generations; g++) {
         runAllCFinalizers(generations[g].weak_ptr_list);
     }
+    runAllCFinalizers(nonmoving_weak_ptr_list);
 
 #if defined(RTS_USER_SIGNALS)
     if (RtsFlags.MiscFlags.install_signal_handlers) {


=====================================
rts/Schedule.c
=====================================
@@ -1710,7 +1710,9 @@ scheduleDoGC (Capability **pcap, Task *task USED_IF_THREADS,
 
     stat_startGCSync(gc_threads[cap->no]);
 
+#if defined(DEBUG)
     unsigned int old_n_capabilities = getNumCapabilities();
+#endif
 
     interruptAllCapabilities();
 
@@ -2306,7 +2308,9 @@ setNumCapabilities (uint32_t new_n_capabilities USED_IF_THREADS)
             moreCapabilities(n_capabilities, new_n_capabilities);
 
             // Resize and update storage manager data structures
+            ACQUIRE_SM_LOCK;
             storageAddCapabilities(n_capabilities, new_n_capabilities);
+            RELEASE_SM_LOCK;
         }
     }
 


=====================================
rts/sm/Evac.c
=====================================
@@ -1254,6 +1254,10 @@ selector_chain:
         uint16_t flags = RELAXED_LOAD(&bd->flags);
         if (flags & (BF_EVACUATED | BF_NONMOVING)) {
             unchain_thunk_selectors(prev_thunk_selector, (StgClosure *)p);
+            if (flags & BF_NONMOVING) {
+                // See Note [Non-moving GC: Marking evacuated objects].
+                markQueuePushClosureGC(&gct->cap->upd_rem_set.queue, (StgClosure*) p);
+            }
             *q = (StgClosure *)p;
             // shortcut, behave as for:  if (evac) evacuate(q);
             if (evac && bd->gen_no < gct->evac_gen_no) {
@@ -1308,6 +1312,12 @@ selector_chain:
             //   - undo the chain we've built to point to p.
             SET_INFO((StgClosure *)p, (const StgInfoTable *)info_ptr);
             RELEASE_STORE(q, (StgClosure *) p);
+            if (Bdescr((StgPtr)p)->flags & BF_NONMOVING) {
+                // See Note [Non-moving GC: Marking evacuated objects].
+                // TODO: This really shouldn't be necessary since whoever won
+                // the race should have pushed
+                markQueuePushClosureGC(&gct->cap->upd_rem_set.queue, (StgClosure*) p);
+            }
             if (evac) evacuate(q);
             unchain_thunk_selectors(prev_thunk_selector, (StgClosure *)p);
             return;
@@ -1398,6 +1408,11 @@ selector_loop:
                   case THUNK_SELECTOR:
                       // Use payload to make a list of thunk selectors, to be
                       // used in unchain_thunk_selectors
+                      //
+                      // FIXME: This seems racy; should we lock this selector to
+                      // ensure that another thread doesn't clobber this node
+                      // of the chain. This would result in some previous
+                      // selectors not being updated when we unchain.
                       RELAXED_STORE(&((StgClosure*)p)->payload[0], (StgClosure *)prev_thunk_selector);
                       prev_thunk_selector = p;
                       p = (StgSelector*)val;
@@ -1422,6 +1437,12 @@ selector_loop:
               // eval_thunk_selector(), because we know val is not
               // a THUNK_SELECTOR.
               if (evac) evacuate(q);
+
+              if (isNonmovingClosure(*q)) {
+                  // See Note [Non-moving GC: Marking evacuated objects].
+                  markQueuePushClosureGC(&gct->cap->upd_rem_set.queue, (StgClosure*) *q);
+              }
+
               return;
           }
 
@@ -1466,6 +1487,10 @@ selector_loop:
           // recurse indefinitely, so we impose a depth bound.
           // See Note [Selector optimisation depth limit].
           if (gct->thunk_selector_depth >= MAX_THUNK_SELECTOR_DEPTH) {
+              if (isNonmovingClosure((StgClosure *) p)) {
+                  // See Note [Non-moving GC: Marking evacuated objects].
+                  markQueuePushClosureGC(&gct->cap->upd_rem_set.queue, (StgClosure*) p);
+              }
               goto bale_out;
           }
 
@@ -1512,5 +1537,9 @@ bale_out:
     if (evac) {
         copy(q,(const StgInfoTable *)info_ptr,(StgClosure *)p,THUNK_SELECTOR_sizeW(),bd->dest_no);
     }
+    if (isNonmovingClosure(*q)) {
+        // See Note [Non-moving GC: Marking evacuated objects].
+        markQueuePushClosureGC(&gct->cap->upd_rem_set.queue, *q);
+    }
     unchain_thunk_selectors(prev_thunk_selector, *q);
 }


=====================================
rts/sm/GC.c
=====================================
@@ -375,7 +375,8 @@ GarbageCollect (uint32_t collect_gen,
           static_flag == STATIC_FLAG_A ? STATIC_FLAG_B : STATIC_FLAG_A;
   }
 
-  if (major_gc) {
+  /* N.B. We currently don't unload code with the non-moving collector. */
+  if (major_gc && !RtsFlags.GcFlags.useNonmoving) {
       unload_mark_needed = prepareUnloadCheck();
   } else {
       unload_mark_needed = false;


=====================================
rts/sm/MarkWeak.c
=====================================
@@ -50,7 +50,7 @@
 
    - weak_stage == WeakPtrs
 
-     We process all the weak pointers whos keys are alive (evacuate
+     We process all the weak pointers whose keys are alive (evacuate
      their values and finalizers), and repeat until we can find no new
      live keys.  If no live keys are found in this pass, then we
      evacuate the finalizers of all the dead weak pointers in order to
@@ -82,12 +82,46 @@ static bool tidyWeakList (generation *gen);
 static bool resurrectUnreachableThreads (generation *gen, StgTSO **resurrected_threads);
 static void    tidyThreadList (generation *gen);
 
+/*
+ * Note [Weak pointer processing and the non-moving GC]
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * When using the non-moving GC we defer weak pointer processing
+ * until the concurrent marking phase as weaks in the non-moving heap may be
+ * keyed on objects living in the non-moving generation. To accomplish this
+ * initWeakForGC keeps all weak pointers on oldest_gen->weak_ptr_list, where
+ * nonmovingCollect will find them. From there they will be moved to
+ * nonmoving_old_weak_ptr_list. During the mark loop we will move weaks with
+ * reachable keys to nonmoving_weak_ptr_list. At the end of concurrent marking
+ * we tidy the weak list (in nonmovingTidyWeakList) and perform another set of
+ * marking as necessary, just as is done in tidyWeakList.
+ *
+ * Note that this treatment takes advantage of the fact that we usually need
+ * not worry about Weak#s living in the non-moving heap but being keyed on an
+ * object in the moving heap since the Weak# must be strictly older than the
+ * key. Such objects would otherwise pose a problem since the non-moving
+ * collector would be unable to safely determine the liveness of the key.
+ * In the rare case that we *do* see such a key (e.g. in the case of a
+ * pinned ByteArray# living in a partially-filled accumulator block)
+ * the nonmoving collector assumes that it is live.
+ *
+ */
+
+/*
+ * Prepare the weak object lists for GC. Specifically, reset weak_stage
+ * and move all generations' `weak_ptr_list`s to `old_weak_ptr_list`.
+ * Weaks with live keys will later be moved back to `weak_ptr_list` by
+ * `tidyWeakList`.
+ */
 void
 initWeakForGC(void)
 {
-    uint32_t g;
+    uint32_t oldest = N;
+    if (RtsFlags.GcFlags.useNonmoving && N == oldest_gen->no) {
+        // See Note [Weak pointer processing and the non-moving GC].
+        oldest = oldest_gen->no - 1;
+    }
 
-    for (g = 0; g <= N; g++) {
+    for (uint32_t g = 0; g <= oldest; g++) {
         generation *gen = &generations[g];
         gen->old_weak_ptr_list = gen->weak_ptr_list;
         gen->weak_ptr_list = NULL;
@@ -96,6 +130,14 @@ initWeakForGC(void)
     weak_stage = WeakThreads;
 }
 
+/*
+ * Walk the weak pointer lists after having finished a round of scavenging,
+ * tidying the weak (and possibly thread) lists (depending upon the current
+ * weak_stage).
+ *
+ * Returns true if new live weak pointers were found, implying that another
+ * round of scavenging is necessary.
+ */
 bool
 traverseWeakPtrList(StgWeak **dead_weak_ptr_list, StgTSO **resurrected_threads)
 {
@@ -182,6 +224,11 @@ traverseWeakPtrList(StgWeak **dead_weak_ptr_list, StgTSO **resurrected_threads)
   }
 }
 
+/*
+ * Deal with weak pointers with unreachable keys after GC has concluded.
+ * This means marking the finalizer (and possibly value) in preparation for
+ * later finalization.
+ */
 static void collectDeadWeakPtrs (generation *gen, StgWeak **dead_weak_ptr_list)
 {
     StgWeak *w, *next_w;
@@ -198,6 +245,10 @@ static void collectDeadWeakPtrs (generation *gen, StgWeak **dead_weak_ptr_list)
     }
 }
 
+/*
+ * Deal with threads left on the old_threads list after GC has concluded,
+ * moving them onto the resurrected_threads list where appropriate.
+ */
 static bool resurrectUnreachableThreads (generation *gen, StgTSO **resurrected_threads)
 {
     StgTSO *t, *tmp, *next;
@@ -233,8 +284,21 @@ static bool resurrectUnreachableThreads (generation *gen, StgTSO **resurrected_t
     return flag;
 }
 
+/*
+ * Walk over the `old_weak_ptr_list` of the given generation and:
+ *
+ *  - remove any DEAD_WEAKs
+ *  - move any weaks with reachable keys to the `weak_ptr_list` of the
+ *    appropriate to-space and mark the weak's value and finalizer.
+ */
 static bool tidyWeakList(generation *gen)
 {
+    if (RtsFlags.GcFlags.useNonmoving && gen == oldest_gen) {
+        // See Note [Weak pointer processing and the non-moving GC].
+        ASSERT(gen->old_weak_ptr_list == NULL);
+        return false;
+    }
+
     StgWeak *w, **last_w, *next_w;
     const StgInfoTable *info;
     StgClosure *new;
@@ -322,6 +386,10 @@ static bool tidyWeakList(generation *gen)
     return flag;
 }
 
+/*
+ * Walk over the `old_threads` list of the given generation and move any
+ * reachable threads onto the `threads` list.
+ */
 static void tidyThreadList (generation *gen)
 {
     StgTSO *t, *tmp, *next, **prev;
@@ -381,6 +449,10 @@ static void checkWeakPtrSanity(StgWeak *hd, StgWeak *tl)
 }
 #endif
 
+/*
+ * Traverse the capabilities' local new-weak-pointer lists at the beginning of
+ * GC and move them to the nursery's weak_ptr_list.
+ */
 void collectFreshWeakPtrs()
 {
     uint32_t i;


=====================================
rts/sm/NonMoving.c
=====================================
@@ -244,6 +244,12 @@ Mutex concurrent_coll_finished_lock;
  *    how we use the DIRTY flags associated with MUT_VARs and TVARs to improve
  *    barrier efficiency.
  *
+ *  - Note [Weak pointer processing and the non-moving GC] (MarkWeak.c) describes
+ *    how weak pointers are handled when the non-moving GC is in use.
+ *
+ *  - Note [Sync phase marking budget] describes how we avoid long mutator
+ *    pauses during the sync phase
+ *
  * [ueno 2016]:
  *   Katsuhiro Ueno and Atsushi Ohori. 2016. A fully concurrent garbage
  *   collector for functional programs on multicore processors. SIGPLAN Not. 51,
@@ -292,6 +298,7 @@ Mutex concurrent_coll_finished_lock;
  *                    ┆
  *      B ←────────────── A ←─────────────── root
  *      │             ┆     ↖─────────────── gen1 mut_list
+ *      │             ┆
  *      ╰───────────────→ C
  *                    ┆
  *
@@ -332,6 +339,7 @@ Mutex concurrent_coll_finished_lock;
  * The implementation details of this are described in Note [Non-moving GC:
  * Marking evacuated objects] in Evac.c.
  *
+ *
  * Note [Deadlock detection under the non-moving collector]
  * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  * In GHC the garbage collector is responsible for identifying deadlocked
@@ -493,10 +501,44 @@ Mutex concurrent_coll_finished_lock;
  * remembered set during the preparatory GC. This allows us to safely skip the
  * non-moving write barrier without jeopardizing the snapshot invariant.
  *
+ *
+ * Note [Sync phase marking budget]
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * The non-moving collector is intended to provide reliably low collection
+ * latencies. These latencies are primarily due to two sources:
+ *
+ *  a. the preparatory moving collection at the beginning of the major GC cycle
+ *  b. the post-mark synchronization pause at the end
+ *
+ * While the cost of (a) is inherently bounded by the young generation size,
+ * (b) can in principle be unbounded since the mutator may hide large swathes
+ * of heap from the collector's concurrent mark phase via mutation. These will
+ * only become visible to the collector during the post-mark synchronization
+ * phase.
+ *
+ * Since we don't want to do unbounded marking work in the pause, we impose a
+ * limit (specifically, sync_phase_marking_budget) on the amount of work
+ * (namely, the number of marked closures) that we can do during the pause. If
+ * we deplete our marking budget during the pause then we allow the mutators to
+ * resume and return to concurrent marking (keeping the update remembered set
+ * write barrier enabled). After we have finished marking we will again
+ * attempt the post-mark synchronization.
+ *
+ * The choice of sync_phase_marking_budget was made empirically. On 2022
+ * hardware and a "typical" test program we tend to mark ~10^7 closures per
+ * second. Consequently, a sync_phase_marking_budget of 10^5 should produce
+ * ~10 ms pauses, which seems like a reasonable tradeoff.
+ *
+ * TODO: Perhaps sync_phase_marking_budget should be controllable via a
+ * command-line argument?
+ *
  */
 
 memcount nonmoving_live_words = 0;
 
+// See Note [Sync phase marking budget].
+MarkBudget sync_phase_marking_budget = 200000;
+
 #if defined(THREADED_RTS)
 static void* nonmovingConcurrentMark(void *mark_queue);
 #endif
@@ -665,10 +707,11 @@ void *nonmovingAllocate(Capability *cap, StgWord sz)
     // object and not moved) which is covered by allocator 9.
     ASSERT(log_block_size < NONMOVING_ALLOCA0 + NONMOVING_ALLOCA_CNT);
 
-    struct NonmovingAllocator *alloca = nonmovingHeap.allocators[log_block_size - NONMOVING_ALLOCA0];
+    unsigned int alloca_idx = log_block_size - NONMOVING_ALLOCA0;
+    struct NonmovingAllocator *alloca = &nonmovingHeap.allocators[alloca_idx];
 
     // Allocate into current segment
-    struct NonmovingSegment *current = alloca->current[cap->no];
+    struct NonmovingSegment *current = cap->current_segments[alloca_idx];
     ASSERT(current); // current is never NULL
     void *ret = nonmovingSegmentGetBlock_(current, log_block_size, current->next_free);
     ASSERT(GET_CLOSURE_TAG(ret) == 0); // check alignment
@@ -701,29 +744,12 @@ void *nonmovingAllocate(Capability *cap, StgWord sz)
         // make it current
         new_current->link = NULL;
         SET_SEGMENT_STATE(new_current, CURRENT);
-        alloca->current[cap->no] = new_current;
+        cap->current_segments[alloca_idx] = new_current;
     }
 
     return ret;
 }
 
-/* Allocate a nonmovingAllocator */
-static struct NonmovingAllocator *alloc_nonmoving_allocator(uint32_t n_caps)
-{
-    size_t allocator_sz =
-        sizeof(struct NonmovingAllocator) +
-        sizeof(void*) * n_caps; // current segment pointer for each capability
-    struct NonmovingAllocator *alloc =
-        stgMallocBytes(allocator_sz, "nonmovingInit");
-    memset(alloc, 0, allocator_sz);
-    return alloc;
-}
-
-static void free_nonmoving_allocator(struct NonmovingAllocator *alloc)
-{
-    stgFree(alloc);
-}
-
 void nonmovingInit(void)
 {
     if (! RtsFlags.GcFlags.useNonmoving) return;
@@ -732,10 +758,7 @@ void nonmovingInit(void)
     initCondition(&concurrent_coll_finished);
     initMutex(&concurrent_coll_finished_lock);
 #endif
-    for (unsigned int i = 0; i < NONMOVING_ALLOCA_CNT; i++) {
-        nonmovingHeap.allocators[i] = alloc_nonmoving_allocator(getNumCapabilities());
-    }
-    nonmovingMarkInitUpdRemSet();
+    nonmovingMarkInit();
 }
 
 // Stop any nonmoving collection in preparation for RTS shutdown.
@@ -764,44 +787,24 @@ void nonmovingExit(void)
     closeCondition(&concurrent_coll_finished);
     closeMutex(&nonmoving_collection_mutex);
 #endif
-
-    for (unsigned int i = 0; i < NONMOVING_ALLOCA_CNT; i++) {
-        free_nonmoving_allocator(nonmovingHeap.allocators[i]);
-    }
 }
 
-/*
- * Assumes that no garbage collector or mutator threads are running to safely
- * resize the nonmoving_allocators.
- *
- * Must hold sm_mutex.
- */
-void nonmovingAddCapabilities(uint32_t new_n_caps)
+/* Initialize a new capability. Caller must hold SM_LOCK */
+void nonmovingInitCapability(Capability *cap)
 {
-    unsigned int old_n_caps = nonmovingHeap.n_caps;
-    struct NonmovingAllocator **allocs = nonmovingHeap.allocators;
-
+    // Initialize current segment array
+    struct NonmovingSegment **segs =
+        stgMallocBytes(sizeof(struct NonmovingSegment*) * NONMOVING_ALLOCA_CNT, "current segment array");
     for (unsigned int i = 0; i < NONMOVING_ALLOCA_CNT; i++) {
-        struct NonmovingAllocator *old = allocs[i];
-        allocs[i] = alloc_nonmoving_allocator(new_n_caps);
-
-        // Copy the old state
-        allocs[i]->filled = old->filled;
-        allocs[i]->active = old->active;
-        for (unsigned int j = 0; j < old_n_caps; j++) {
-            allocs[i]->current[j] = old->current[j];
-        }
-        stgFree(old);
-
-        // Initialize current segments for the new capabilities
-        for (unsigned int j = old_n_caps; j < new_n_caps; j++) {
-            allocs[i]->current[j] = nonmovingAllocSegment(capabilities[j]->node);
-            nonmovingInitSegment(allocs[i]->current[j], NONMOVING_ALLOCA0 + i);
-            SET_SEGMENT_STATE(allocs[i]->current[j], CURRENT);
-            allocs[i]->current[j]->link = NULL;
-        }
+        segs[i] = nonmovingAllocSegment(cap->node);
+        nonmovingInitSegment(segs[i], NONMOVING_ALLOCA0 + i);
+        SET_SEGMENT_STATE(segs[i], CURRENT);
     }
-    nonmovingHeap.n_caps = new_n_caps;
+    cap->current_segments = segs;
+
+    // Initialize update remembered set
+    cap->upd_rem_set.queue.blocks = NULL;
+    nonmovingInitUpdRemSet(&cap->upd_rem_set);
 }
 
 void nonmovingClearBitmap(struct NonmovingSegment *seg)
@@ -821,18 +824,21 @@ static void nonmovingPrepareMark(void)
     // Should have been cleared by the last sweep
     ASSERT(nonmovingHeap.sweep_list == NULL);
 
+    nonmovingHeap.n_caps = n_capabilities;
     nonmovingBumpEpoch();
     for (int alloca_idx = 0; alloca_idx < NONMOVING_ALLOCA_CNT; ++alloca_idx) {
-        struct NonmovingAllocator *alloca = nonmovingHeap.allocators[alloca_idx];
+        struct NonmovingAllocator *alloca = &nonmovingHeap.allocators[alloca_idx];
 
         // Update current segments' snapshot pointers
-        for (uint32_t cap_n = 0; cap_n < getNumCapabilities(); ++cap_n) {
-            struct NonmovingSegment *seg = alloca->current[cap_n];
+        for (uint32_t cap_n = 0; cap_n < nonmovingHeap.n_caps; ++cap_n) {
+            Capability *cap = capabilities[cap_n];
+            struct NonmovingSegment *seg = cap->current_segments[alloca_idx];
             nonmovingSegmentInfo(seg)->next_free_snap = seg->next_free;
         }
 
         // Save the filled segments for later processing during the concurrent
         // mark phase.
+        ASSERT(alloca->saved_filled == NULL);
         alloca->saved_filled = alloca->filled;
         alloca->filled = NULL;
 
@@ -886,44 +892,7 @@ static void nonmovingPrepareMark(void)
 #endif
 }
 
-// Mark weak pointers in the non-moving heap. They'll either end up in
-// dead_weak_ptr_list or stay in weak_ptr_list. Either way they need to be kept
-// during sweep. See `MarkWeak.c:markWeakPtrList` for the moving heap variant
-// of this.
-static void nonmovingMarkWeakPtrList(MarkQueue *mark_queue, StgWeak *dead_weak_ptr_list)
-{
-    for (StgWeak *w = oldest_gen->weak_ptr_list; w; w = w->link) {
-        markQueuePushClosureGC(mark_queue, (StgClosure*)w);
-        // Do not mark finalizers and values here, those fields will be marked
-        // in `nonmovingMarkDeadWeaks` (for dead weaks) or
-        // `nonmovingTidyWeaks` (for live weaks)
-    }
-
-    // We need to mark dead_weak_ptr_list too. This is subtle:
-    //
-    // - By the beginning of this GC we evacuated all weaks to the non-moving
-    //   heap (in `markWeakPtrList`)
-    //
-    // - During the scavenging of the moving heap we discovered that some of
-    //   those weaks are dead and moved them to `dead_weak_ptr_list`. Note that
-    //   because of the fact above _all weaks_ are in the non-moving heap at
-    //   this point.
-    //
-    // - So, to be able to traverse `dead_weak_ptr_list` and run finalizers we
-    //   need to mark it.
-    for (StgWeak *w = dead_weak_ptr_list; w; w = w->link) {
-        markQueuePushClosureGC(mark_queue, (StgClosure*)w);
-
-        // Mark the value and finalizer since they will be needed regardless of
-        // whether we find the weak is live.
-        if (w->cfinalizers != &stg_NO_FINALIZER_closure) {
-            markQueuePushClosureGC(mark_queue, w->value);
-        }
-        markQueuePushClosureGC(mark_queue, w->finalizer);
-    }
-}
-
-void nonmovingCollect(StgWeak **dead_weaks, StgTSO **resurrected_threads)
+void nonmovingCollect(StgWeak **dead_weaks STG_UNUSED, StgTSO **resurrected_threads)
 {
 #if defined(THREADED_RTS)
     // We can't start a new collection until the old one has finished
@@ -945,6 +914,7 @@ void nonmovingCollect(StgWeak **dead_weaks, StgTSO **resurrected_threads)
     ASSERT(n_nonmoving_marked_compact_blocks == 0);
 
     MarkQueue *mark_queue = stgMallocBytes(sizeof(MarkQueue), "mark queue");
+    mark_queue->blocks = NULL;
     initMarkQueue(mark_queue);
     current_mark_queue = mark_queue;
 
@@ -956,9 +926,16 @@ void nonmovingCollect(StgWeak **dead_weaks, StgTSO **resurrected_threads)
                 capabilities[n], true/*don't mark sparks*/);
     }
     markScheduler((evac_fn)markQueueAddRoot, mark_queue);
-    nonmovingMarkWeakPtrList(mark_queue, *dead_weaks);
     markStablePtrTable((evac_fn)markQueueAddRoot, mark_queue);
 
+    // The dead weak pointer list shouldn't contain any weaks in the
+    // nonmoving heap
+#if defined(DEBUG)
+    for (StgWeak *w = *dead_weaks; w; w = w->link) {
+        ASSERT(Bdescr((StgPtr) w)->gen != oldest_gen);
+    }
+#endif
+
     // Mark threads resurrected during moving heap scavenging
     for (StgTSO *tso = *resurrected_threads; tso != END_TSO_QUEUE; tso = tso->global_link) {
         markQueuePushClosureGC(mark_queue, (StgClosure*)tso);
@@ -984,8 +961,23 @@ void nonmovingCollect(StgWeak **dead_weaks, StgTSO **resurrected_threads)
     // alive).
     ASSERT(oldest_gen->old_weak_ptr_list == NULL);
     ASSERT(nonmoving_old_weak_ptr_list == NULL);
-    nonmoving_old_weak_ptr_list = oldest_gen->weak_ptr_list;
-    oldest_gen->weak_ptr_list = NULL;
+    {
+        // Move both oldest_gen->weak_ptr_list and nonmoving_weak_ptr_list to
+        // nonmoving_old_weak_ptr_list
+        StgWeak **weaks = &oldest_gen->weak_ptr_list;
+        uint32_t n = 0;
+        while (*weaks) {
+            weaks = &(*weaks)->link;
+            n++;
+        }
+        debugTrace(DEBUG_nonmoving_gc, "%d new nonmoving weaks", n);
+        *weaks = nonmoving_weak_ptr_list;
+        nonmoving_old_weak_ptr_list = oldest_gen->weak_ptr_list;
+        nonmoving_weak_ptr_list = NULL;
+        oldest_gen->weak_ptr_list = NULL;
+        // At this point all weaks in the nonmoving generation are on
+        // nonmoving_old_weak_ptr_list
+    }
     trace(TRACE_nonmoving_gc, "Finished nonmoving GC preparation");
 
     // We are now safe to start concurrent marking
@@ -1021,19 +1013,25 @@ void nonmovingCollect(StgWeak **dead_weaks, StgTSO **resurrected_threads)
 }
 
 /* Mark queue, threads, and weak pointers until no more weaks have been
- * resuscitated
+ * resuscitated. If *budget is non-zero then we will mark no more than
+ * Returns true if we there is no more marking work to be done, false if
+ * we exceeded our marking budget.
  */
-static void nonmovingMarkThreadsWeaks(MarkQueue *mark_queue)
+static bool nonmovingMarkThreadsWeaks(MarkBudget *budget, MarkQueue *mark_queue)
 {
     while (true) {
         // Propagate marks
-        nonmovingMark(mark_queue);
+        nonmovingMark(budget, mark_queue);
+        if (*budget == 0) {
+            return false;
+        }
 
         // Tidy threads and weaks
         nonmovingTidyThreads();
 
-        if (! nonmovingTidyWeaks(mark_queue))
-            return;
+        if (! nonmovingTidyWeaks(mark_queue)) {
+            return true;
+        }
     }
 }
 
@@ -1047,7 +1045,6 @@ static void* nonmovingConcurrentMark(void *data)
     return NULL;
 }
 
-// TODO: Not sure where to put this function.
 // Append w2 to the end of w1.
 static void appendWeakList( StgWeak **w1, StgWeak *w2 )
 {
@@ -1067,13 +1064,14 @@ static void nonmovingMark_(MarkQueue *mark_queue, StgWeak **dead_weaks, StgTSO *
     // Walk the list of filled segments that we collected during preparation,
     // updated their snapshot pointers and move them to the sweep list.
     for (int alloca_idx = 0; alloca_idx < NONMOVING_ALLOCA_CNT; ++alloca_idx) {
-        struct NonmovingSegment *filled = nonmovingHeap.allocators[alloca_idx]->saved_filled;
+        struct NonmovingSegment *filled = nonmovingHeap.allocators[alloca_idx].saved_filled;
         uint32_t n_filled = 0;
         if (filled) {
             struct NonmovingSegment *seg = filled;
             while (true) {
                 // Set snapshot
                 nonmovingSegmentInfo(seg)->next_free_snap = seg->next_free;
+                SET_SEGMENT_STATE(seg, FILLED_SWEEPING);
                 n_filled++;
                 if (seg->link) {
                     seg = seg->link;
@@ -1082,14 +1080,24 @@ static void nonmovingMark_(MarkQueue *mark_queue, StgWeak **dead_weaks, StgTSO *
                 }
             }
             // add filled segments to sweep_list
-            SET_SEGMENT_STATE(seg, FILLED_SWEEPING);
             seg->link = nonmovingHeap.sweep_list;
             nonmovingHeap.sweep_list = filled;
         }
+        nonmovingHeap.allocators[alloca_idx].saved_filled = NULL;
     }
 
+    // Mark Weak#s
+    nonmovingMarkWeakPtrList(mark_queue);
+
     // Do concurrent marking; most of the heap will get marked here.
-    nonmovingMarkThreadsWeaks(mark_queue);
+
+#if defined(THREADED_RTS)
+concurrent_marking:
+#endif
+    {
+        MarkBudget budget = UNLIMITED_MARK_BUDGET;
+        nonmovingMarkThreadsWeaks(&budget, mark_queue);
+    }
 
 #if defined(THREADED_RTS)
     Task *task = newBoundTask();
@@ -1098,21 +1106,13 @@ static void nonmovingMark_(MarkQueue *mark_queue, StgWeak **dead_weaks, StgTSO *
     if (sched_state > SCHED_RUNNING) {
         // Note that we break our invariants here and leave segments in
         // nonmovingHeap.sweep_list, don't free nonmoving_large_objects etc.
-        // However because we won't be running mark-sweep in the final GC this
+        // However because we won't be running sweep in the final GC this
         // is OK.
-
-        // This is a RTS shutdown so we need to move our copy (snapshot) of
-        // weaks (nonmoving_old_weak_ptr_list and nonmoving_weak_ptr_list) to
-        // oldest_gen->threads to be able to run C finalizers in hs_exit_. Note
-        // that there may be more weaks added to oldest_gen->threads since we
-        // started mark, so we need to append our list to the tail of
-        // oldest_gen->threads.
-        appendWeakList(&nonmoving_old_weak_ptr_list, nonmoving_weak_ptr_list);
-        appendWeakList(&oldest_gen->weak_ptr_list, nonmoving_old_weak_ptr_list);
-        // These lists won't be used again so this is not necessary, but still
-        nonmoving_old_weak_ptr_list = NULL;
-        nonmoving_weak_ptr_list = NULL;
-
+        //
+        // However, we must move any weak pointers remaining on
+        // nonmoving_old_weak_ptr_list back to nonmoving_weak_ptr_list
+        // such that their C finalizers can be run by hs_exit_.
+        appendWeakList(&nonmoving_weak_ptr_list, nonmoving_old_weak_ptr_list);
         goto finish;
     }
 
@@ -1120,9 +1120,17 @@ static void nonmovingMark_(MarkQueue *mark_queue, StgWeak **dead_weaks, StgTSO *
     nonmovingBeginFlush(task);
 
     bool all_caps_syncd;
+    MarkBudget sync_marking_budget = sync_phase_marking_budget;
     do {
         all_caps_syncd = nonmovingWaitForFlush();
-        nonmovingMarkThreadsWeaks(mark_queue);
+        if (nonmovingMarkThreadsWeaks(&sync_marking_budget, mark_queue) == false) {
+            // We ran out of budget for marking. Abort sync.
+            // See Note [Sync phase marking budget].
+            traceConcSyncEnd();
+            stat_endNonmovingGcSync();
+            releaseAllCapabilities(n_capabilities, NULL, task);
+            goto concurrent_marking;
+        }
     } while (!all_caps_syncd);
 #endif
 
@@ -1133,7 +1141,7 @@ static void nonmovingMark_(MarkQueue *mark_queue, StgWeak **dead_weaks, StgTSO *
     // Do last marking of weak pointers
     while (true) {
         // Propagate marks
-        nonmovingMark(mark_queue);
+        nonmovingMarkUnlimitedBudget(mark_queue);
 
         if (!nonmovingTidyWeaks(mark_queue))
             break;
@@ -1142,7 +1150,7 @@ static void nonmovingMark_(MarkQueue *mark_queue, StgWeak **dead_weaks, StgTSO *
     nonmovingMarkDeadWeaks(mark_queue, dead_weaks);
 
     // Propagate marks
-    nonmovingMark(mark_queue);
+    nonmovingMarkUnlimitedBudget(mark_queue);
 
     // Now remove all dead objects from the mut_list to ensure that a younger
     // generation collection doesn't attempt to look at them after we've swept.
@@ -1184,15 +1192,9 @@ static void nonmovingMark_(MarkQueue *mark_queue, StgWeak **dead_weaks, StgTSO *
         nonmoving_old_threads = END_TSO_QUEUE;
     }
 
-    {
-        StgWeak **weaks = &oldest_gen->weak_ptr_list;
-        while (*weaks) {
-            weaks = &(*weaks)->link;
-        }
-        *weaks = nonmoving_weak_ptr_list;
-        nonmoving_weak_ptr_list = NULL;
-        nonmoving_old_weak_ptr_list = NULL;
-    }
+    // At this point point any weak that remains on nonmoving_old_weak_ptr_list
+    // has a dead key.
+    nonmoving_old_weak_ptr_list = NULL;
 
     // Prune spark lists
     // See Note [Spark management under the nonmoving collector].
@@ -1290,10 +1292,12 @@ void assert_in_nonmoving_heap(StgPtr p)
     }
 
     for (int alloca_idx = 0; alloca_idx < NONMOVING_ALLOCA_CNT; ++alloca_idx) {
-        struct NonmovingAllocator *alloca = nonmovingHeap.allocators[alloca_idx];
+        struct NonmovingAllocator *alloca = &nonmovingHeap.allocators[alloca_idx];
+
         // Search current segments
-        for (uint32_t cap_idx = 0; cap_idx < getNumCapabilities(); ++cap_idx) {
-            struct NonmovingSegment *seg = alloca->current[cap_idx];
+        for (uint32_t cap_idx = 0; cap_idx < nonmovingHeap.n_caps; ++cap_idx) {
+            Capability *cap = capabilities[cap_idx];
+            struct NonmovingSegment *seg = cap->current_segments[alloca_idx];
             if (p >= (P_)seg && p < (((P_)seg) + NONMOVING_SEGMENT_SIZE_W)) {
                 return;
             }
@@ -1352,33 +1356,16 @@ void nonmovingPrintSegment(struct NonmovingSegment *seg)
     debugBelch("End of segment\n\n");
 }
 
-void nonmovingPrintAllocator(struct NonmovingAllocator *alloc)
-{
-    debugBelch("Allocator at %p\n", (void*)alloc);
-    debugBelch("Filled segments:\n");
-    for (struct NonmovingSegment *seg = alloc->filled; seg != NULL; seg = seg->link) {
-        debugBelch("%p ", (void*)seg);
-    }
-    debugBelch("\nActive segments:\n");
-    for (struct NonmovingSegment *seg = alloc->active; seg != NULL; seg = seg->link) {
-        debugBelch("%p ", (void*)seg);
-    }
-    debugBelch("\nCurrent segments:\n");
-    for (uint32_t i = 0; i < getNumCapabilities(); ++i) {
-        debugBelch("%p ", alloc->current[i]);
-    }
-    debugBelch("\n");
-}
-
 void locate_object(P_ obj)
 {
     // Search allocators
     for (int alloca_idx = 0; alloca_idx < NONMOVING_ALLOCA_CNT; ++alloca_idx) {
-        struct NonmovingAllocator *alloca = nonmovingHeap.allocators[alloca_idx];
-        for (uint32_t cap = 0; cap < getNumCapabilities(); ++cap) {
-            struct NonmovingSegment *seg = alloca->current[cap];
+        struct NonmovingAllocator *alloca = &nonmovingHeap.allocators[alloca_idx];
+        for (uint32_t cap_n = 0; cap_n < nonmovingHeap.n_caps; ++cap_n) {
+            Capability *cap = capabilities[cap_n];
+            struct NonmovingSegment *seg = cap->current_segments[alloca_idx];
             if (obj >= (P_)seg && obj < (((P_)seg) + NONMOVING_SEGMENT_SIZE_W)) {
-                debugBelch("%p is in current segment of capability %d of allocator %d at %p\n", obj, cap, alloca_idx, (void*)seg);
+                debugBelch("%p is in current segment of capability %d of allocator %d at %p\n", obj, cap_n, alloca_idx, (void*)seg);
                 return;
             }
         }


=====================================
rts/sm/NonMoving.h
=====================================
@@ -84,8 +84,7 @@ struct NonmovingAllocator {
     struct NonmovingSegment *filled;
     struct NonmovingSegment *saved_filled;
     struct NonmovingSegment *active;
-    // indexed by capability number
-    struct NonmovingSegment *current[];
+    // N.B. Per-capabilty "current" segment lives in Capability
 };
 
 // first allocator is of size 2^NONMOVING_ALLOCA0 (in bytes)
@@ -99,7 +98,7 @@ struct NonmovingAllocator {
 #define NONMOVING_MAX_FREE 16
 
 struct NonmovingHeap {
-    struct NonmovingAllocator *allocators[NONMOVING_ALLOCA_CNT];
+    struct NonmovingAllocator allocators[NONMOVING_ALLOCA_CNT];
     // free segment list. This is a cache where we keep up to
     // NONMOVING_MAX_FREE segments to avoid thrashing the block allocator.
     // Note that segments in this list are still counted towards
@@ -149,7 +148,7 @@ void nonmovingCollect(StgWeak **dead_weaks,
                        StgTSO **resurrected_threads);
 
 void *nonmovingAllocate(Capability *cap, StgWord sz);
-void nonmovingAddCapabilities(uint32_t new_n_caps);
+void nonmovingInitCapability(Capability *cap);
 void nonmovingPushFreeSegment(struct NonmovingSegment *seg);
 void nonmovingClearBitmap(struct NonmovingSegment *seg);
 
@@ -166,7 +165,7 @@ INLINE_HEADER uint8_t nonmovingSegmentLogBlockSize(struct NonmovingSegment *seg)
 INLINE_HEADER void nonmovingPushActiveSegment(struct NonmovingSegment *seg)
 {
     struct NonmovingAllocator *alloc =
-        nonmovingHeap.allocators[nonmovingSegmentLogBlockSize(seg) - NONMOVING_ALLOCA0];
+        &nonmovingHeap.allocators[nonmovingSegmentLogBlockSize(seg) - NONMOVING_ALLOCA0];
     SET_SEGMENT_STATE(seg, ACTIVE);
     while (true) {
         struct NonmovingSegment *current_active = (struct NonmovingSegment*)VOLATILE_LOAD(&alloc->active);
@@ -181,7 +180,7 @@ INLINE_HEADER void nonmovingPushActiveSegment(struct NonmovingSegment *seg)
 INLINE_HEADER void nonmovingPushFilledSegment(struct NonmovingSegment *seg)
 {
     struct NonmovingAllocator *alloc =
-        nonmovingHeap.allocators[nonmovingSegmentLogBlockSize(seg) - NONMOVING_ALLOCA0];
+        &nonmovingHeap.allocators[nonmovingSegmentLogBlockSize(seg) - NONMOVING_ALLOCA0];
     SET_SEGMENT_STATE(seg, FILLED);
     while (true) {
         struct NonmovingSegment *current_filled = (struct NonmovingSegment*)VOLATILE_LOAD(&alloc->filled);
@@ -289,20 +288,17 @@ INLINE_HEADER void nonmovingSetClosureMark(StgPtr p)
     nonmovingSetMark(nonmovingGetSegment(p), nonmovingGetBlockIdx(p));
 }
 
-// TODO: Audit the uses of these
-/* Was the given closure marked this major GC cycle? */
-INLINE_HEADER bool nonmovingClosureMarkedThisCycle(StgPtr p)
+INLINE_HEADER uint8_t nonmovingGetClosureMark(StgPtr p)
 {
     struct NonmovingSegment *seg = nonmovingGetSegment(p);
     nonmoving_block_idx blk_idx = nonmovingGetBlockIdx(p);
-    return nonmovingGetMark(seg, blk_idx) == nonmovingMarkEpoch;
+    return nonmovingGetMark(seg, blk_idx);
 }
 
-INLINE_HEADER bool nonmovingClosureMarked(StgPtr p)
+/* Was the given closure marked this major GC cycle? */
+INLINE_HEADER bool nonmovingClosureMarkedThisCycle(StgPtr p)
 {
-    struct NonmovingSegment *seg = nonmovingGetSegment(p);
-    nonmoving_block_idx blk_idx = nonmovingGetBlockIdx(p);
-    return nonmovingGetMark(seg, blk_idx) != 0;
+    return nonmovingGetClosureMark(p) == nonmovingMarkEpoch;
 }
 
 // Can be called during a major collection to determine whether a particular
@@ -336,10 +332,14 @@ INLINE_HEADER bool nonmovingClosureBeingSwept(StgClosure *p)
     }
 }
 
+// N.B. RtsFlags is defined as a pointer in STG code consequently this code
+// doesn't typecheck.
+#if !IN_STG_CODE
 INLINE_HEADER bool isNonmovingClosure(StgClosure *p)
 {
-    return !HEAP_ALLOCED_GC(p) || Bdescr((P_)p)->flags & BF_NONMOVING;
+    return RtsFlags.GcFlags.useNonmoving && (!HEAP_ALLOCED_GC(p) || Bdescr((P_)p)->flags & BF_NONMOVING);
 }
+#endif
 
 #if defined(DEBUG)
 


=====================================
rts/sm/NonMovingCensus.c
=====================================
@@ -21,10 +21,12 @@
 // stopped. In this case is safe to look at active and current segments so we can
 // also collect statistics on live words.
 static struct NonmovingAllocCensus
-nonmovingAllocatorCensus_(struct NonmovingAllocator *alloc, bool collect_live_words)
+nonmovingAllocatorCensus_(uint32_t alloc_idx, bool collect_live_words)
 {
     struct NonmovingAllocCensus census = {collect_live_words, 0, 0, 0, 0};
+    struct NonmovingAllocator *alloc = &nonmovingHeap.allocators[alloc_idx];
 
+    // filled segments
     for (struct NonmovingSegment *seg = alloc->filled;
          seg != NULL;
          seg = seg->link)
@@ -40,6 +42,7 @@ nonmovingAllocatorCensus_(struct NonmovingAllocator *alloc, bool collect_live_wo
         }
     }
 
+    // active segments
     for (struct NonmovingSegment *seg = alloc->active;
          seg != NULL;
          seg = seg->link)
@@ -56,9 +59,11 @@ nonmovingAllocatorCensus_(struct NonmovingAllocator *alloc, bool collect_live_wo
         }
     }
 
-    for (unsigned int cap=0; cap < getNumCapabilities(); cap++)
+    // current segments
+    for (unsigned int cap_n=0; cap_n < getNumCapabilities(); cap_n++)
     {
-        struct NonmovingSegment *seg = alloc->current[cap];
+        Capability *cap = capabilities[cap_n];
+        struct NonmovingSegment *seg = cap->current_segments[alloc_idx];
         unsigned int n = nonmovingSegmentBlockCount(seg);
         for (unsigned int i=0; i < n; i++) {
             if (nonmovingGetMark(seg, i)) {
@@ -76,15 +81,15 @@ nonmovingAllocatorCensus_(struct NonmovingAllocator *alloc, bool collect_live_wo
  * all blocks in nonmoving heap are valid closures.
  */
 struct NonmovingAllocCensus
-nonmovingAllocatorCensusWithWords(struct NonmovingAllocator *alloc)
+nonmovingAllocatorCensusWithWords(uint32_t alloc_idx)
 {
-    return nonmovingAllocatorCensus_(alloc, true);
+    return nonmovingAllocatorCensus_(alloc_idx, true);
 }
 
 struct NonmovingAllocCensus
-nonmovingAllocatorCensus(struct NonmovingAllocator *alloc)
+nonmovingAllocatorCensus(uint32_t alloc_idx)
 {
-    return nonmovingAllocatorCensus_(alloc, false);
+    return nonmovingAllocatorCensus_(alloc_idx, false);
 }
 
 
@@ -130,7 +135,7 @@ void nonmovingPrintAllocatorCensus(bool collect_live_words)
 
     for (int i=0; i < NONMOVING_ALLOCA_CNT; i++) {
         struct NonmovingAllocCensus census =
-            nonmovingAllocatorCensus_(nonmovingHeap.allocators[i], collect_live_words);
+            nonmovingAllocatorCensus_(i, collect_live_words);
 
         print_alloc_census(i, census);
     }
@@ -143,8 +148,7 @@ void nonmovingTraceAllocatorCensus()
         return;
 
     for (int i=0; i < NONMOVING_ALLOCA_CNT; i++) {
-        const struct NonmovingAllocCensus census =
-            nonmovingAllocatorCensus(nonmovingHeap.allocators[i]);
+        const struct NonmovingAllocCensus census = nonmovingAllocatorCensus(i);
         const uint32_t log_blk_size = i + NONMOVING_ALLOCA0;
         traceNonmovingHeapCensus(log_blk_size, &census);
     }


=====================================
rts/sm/NonMovingCensus.h
=====================================
@@ -20,10 +20,10 @@ struct NonmovingAllocCensus {
 
 
 struct NonmovingAllocCensus
-nonmovingAllocatorCensusWithWords(struct NonmovingAllocator *alloc);
+nonmovingAllocatorCensusWithWords(uint32_t alloc_idx);
 
 struct NonmovingAllocCensus
-nonmovingAllocatorCensus(struct NonmovingAllocator *alloc);
+nonmovingAllocatorCensus(uint32_t alloc_idx);
 
 void nonmovingPrintAllocatorCensus(bool collect_live_words);
 void nonmovingTraceAllocatorCensus(void);


=====================================
rts/sm/NonMovingMark.c
=====================================
@@ -39,6 +39,9 @@ static void trace_PAP_payload (MarkQueue *queue,
                                StgClosure *fun,
                                StgClosure **payload,
                                StgWord size);
+#if defined(DEBUG)
+static bool is_nonmoving_weak(StgWeak *weak);
+#endif
 
 // How many Array# entries to add to the mark queue at once?
 #define MARK_ARRAY_CHUNK_LENGTH 128
@@ -254,7 +257,7 @@ StgWord nonmoving_write_barrier_enabled = false;
 MarkQueue *current_mark_queue = NULL;
 
 /* Initialise update remembered set data structures */
-void nonmovingMarkInitUpdRemSet() {
+void nonmovingMarkInit() {
 #if defined(THREADED_RTS)
     initMutex(&upd_rem_set_lock);
     initCondition(&upd_rem_set_flushed_cond);
@@ -274,6 +277,7 @@ static void nonmovingAddUpdRemSetBlocks_(MarkQueue *rset)
     bdescr *end = start;
     while (end->link != NULL)
         end = end->link;
+    rset->blocks = NULL;
 
     // add the blocks to the global remembered set
     ACQUIRE_LOCK(&upd_rem_set_lock);
@@ -297,8 +301,8 @@ static void nonmovingAddUpdRemSetBlocks_lock(MarkQueue *rset)
     // Reset the state of the remembered set.
     ACQUIRE_SM_LOCK;
     init_mark_queue_(rset);
-    rset->is_upd_rem_set = true;
     RELEASE_SM_LOCK;
+    rset->is_upd_rem_set = true;
 }
 
 /*
@@ -651,6 +655,16 @@ void updateRemembSetPushThunkEager(Capability *cap,
         }
         break;
     }
+    case THUNK_SELECTOR:
+    {
+        StgSelector *sel = (StgSelector *) thunk;
+        if (check_in_nonmoving_heap(sel->selectee)) {
+            // Don't bother to push origin; it makes the barrier needlessly
+            // expensive with little benefit.
+            push_closure(queue, sel->selectee, NULL);
+        }
+        break;
+    }
     case AP:
     {
         StgAP *ap = (StgAP *) thunk;
@@ -660,9 +674,11 @@ void updateRemembSetPushThunkEager(Capability *cap,
         trace_PAP_payload(queue, ap->fun, ap->payload, ap->n_args);
         break;
     }
-    case THUNK_SELECTOR:
+    // We may end up here if a thunk update races with another update.
+    // In this case there is nothing to do as the other thread will have
+    // already pushed the updated thunk's free variables to the update
+    // remembered set.
     case BLACKHOLE:
-        // TODO: This is right, right?
         break;
     // The selector optimization performed by the nonmoving mark may have
     // overwritten a thunk which we are updating with an indirection.
@@ -909,6 +925,7 @@ static MarkQueueEnt markQueuePop (MarkQueue *q)
 static void init_mark_queue_ (MarkQueue *queue)
 {
     bdescr *bd = allocGroup(MARK_QUEUE_BLOCKS);
+    ASSERT(queue->blocks == NULL);
     queue->blocks = bd;
     queue->top = (MarkQueueBlock *) bd->start;
     queue->top->head = 0;
@@ -1293,8 +1310,11 @@ mark_closure (MarkQueue *queue, const StgClosure *p0, StgClosure **origin)
             goto done;
 
         case WHITEHOLE:
-            while (*(StgInfoTable* volatile*) &p->header.info == &stg_WHITEHOLE_info);
-                // busy_wait_nop(); // FIXME
+            while (*(StgInfoTable* volatile*) &p->header.info == &stg_WHITEHOLE_info)
+#if defined(PARALLEL_GC)
+                busy_wait_nop()
+#endif
+                ;
             goto try_again;
 
         default:
@@ -1502,10 +1522,12 @@ mark_closure (MarkQueue *queue, const StgClosure *p0, StgClosure **origin)
         break;
     }
 
+    case WEAK:
+        ASSERT(is_nonmoving_weak((StgWeak*) p));
+        // fallthrough
     gen_obj:
     case CONSTR:
     case CONSTR_NOCAF:
-    case WEAK:
     case PRIM:
     {
         for (StgWord i = 0; i < info->layout.payload.ptrs; i++) {
@@ -1558,8 +1580,15 @@ mark_closure (MarkQueue *queue, const StgClosure *p0, StgClosure **origin)
     }
 
     case THUNK_SELECTOR:
-        nonmoving_eval_thunk_selector(queue, (StgSelector*)p, origin);
+    {
+        StgSelector *sel = (StgSelector *) p;
+        // We may be able to evaluate this selector which may render the
+        // selectee unreachable. However, we must mark the selectee regardless
+        // to satisfy the snapshot invariant.
+        PUSH_FIELD(sel, selectee);
+        nonmoving_eval_thunk_selector(queue, sel, origin);
         break;
+    }
 
     case AP_STACK: {
         StgAP_STACK *ap = (StgAP_STACK *)p;
@@ -1709,15 +1738,23 @@ done:
  *  b. the nursery has been fully evacuated into the non-moving generation.
  *  c. the mark queue has been seeded with a set of roots.
  *
+ * If budget is not UNLIMITED_MARK_BUDGET, then we will mark no more than the
+ * indicated number of objects and deduct the work done from the budget.
  */
 GNUC_ATTR_HOT void
-nonmovingMark (MarkQueue *queue)
+nonmovingMark (MarkBudget* budget, MarkQueue *queue)
 {
     traceConcMarkBegin();
     debugTrace(DEBUG_nonmoving_gc, "Starting mark pass");
-    unsigned int count = 0;
+    uint64_t count = 0;
     while (true) {
         count++;
+        if (*budget == 0) {
+            return;
+        } else if (*budget != UNLIMITED_MARK_BUDGET) {
+            *budget -= 1;
+        }
+
         MarkQueueEnt ent = markQueuePop(queue);
 
         switch (nonmovingMarkQueueEntryType(&ent)) {
@@ -1846,20 +1883,66 @@ static bool nonmovingIsNowAlive (StgClosure *p)
 
     bdescr *bd = Bdescr((P_)p);
 
-    // All non-static objects in the non-moving heap should be marked as
-    // BF_NONMOVING
-    ASSERT(bd->flags & BF_NONMOVING);
+    const uint16_t flags = bd->flags;
+    if (flags & BF_LARGE) {
+        if (flags & BF_PINNED && !(flags & BF_NONMOVING)) {
+            // In this case we have a pinned object living in a non-full
+            // accumulator block which was not promoted to the nonmoving
+            // generation. Assume that the object is alive.
+            // See #22014.
+            return true;
+        }
 
-    if (bd->flags & BF_LARGE) {
+        ASSERT(bd->flags & BF_NONMOVING);
         return (bd->flags & BF_NONMOVING_SWEEPING) == 0
                    // the large object wasn't in the snapshot and therefore wasn't marked
             || (bd->flags & BF_MARKED) != 0;
                    // The object was marked
     } else {
-        return nonmovingClosureMarkedThisCycle((P_)p);
+        // All non-static objects in the non-moving heap should be marked as
+        // BF_NONMOVING.
+        ASSERT(bd->flags & BF_NONMOVING);
+
+        struct NonmovingSegment *seg = nonmovingGetSegment((StgPtr) p);
+        StgClosure *snapshot_loc =
+          (StgClosure *) nonmovingSegmentGetBlock(seg, nonmovingSegmentInfo(seg)->next_free_snap);
+        if (p >= snapshot_loc && nonmovingGetClosureMark((StgPtr) p) == 0) {
+            /*
+             * In this case we are looking at a block that wasn't allocated
+             * at the time that the snapshot was taken. As we do not mark such
+             * blocks, we must assume that it is reachable.
+             */
+            return true;
+        } else {
+            return nonmovingClosureMarkedThisCycle((P_)p);
+        }
+    }
+}
+
+// Mark all Weak#s on nonmoving_old_weak_ptr_list.
+void nonmovingMarkWeakPtrList (struct MarkQueue_ *queue)
+{
+    ASSERT(nonmoving_weak_ptr_list == NULL);
+    for (StgWeak *w = nonmoving_old_weak_ptr_list; w != NULL; w = w->link) {
+        mark_closure(queue, (StgClosure *) w, NULL);
     }
 }
 
+// Determine whether a weak pointer object is on one of the nonmoving
+// collector's weak pointer lists. Used for sanity checking.
+#if defined(DEBUG)
+static bool is_nonmoving_weak(StgWeak *weak)
+{
+    for (StgWeak *w = nonmoving_old_weak_ptr_list; w != NULL; w = w->link) {
+        if (w == weak) return true;
+    }
+    for (StgWeak *w = nonmoving_weak_ptr_list; w != NULL; w = w->link) {
+        if (w == weak) return true;
+    }
+    return false;
+}
+#endif
+
 // Non-moving heap variant of `tidyWeakList`
 bool nonmovingTidyWeaks (struct MarkQueue_ *queue)
 {
@@ -1868,6 +1951,9 @@ bool nonmovingTidyWeaks (struct MarkQueue_ *queue)
     StgWeak **last_w = &nonmoving_old_weak_ptr_list;
     StgWeak *next_w;
     for (StgWeak *w = nonmoving_old_weak_ptr_list; w != NULL; w = next_w) {
+        // This should have been marked by nonmovingMarkWeaks
+        ASSERT(nonmovingIsNowAlive((StgClosure *) w));
+
         if (w->header.info == &stg_DEAD_WEAK_info) {
             // finalizeWeak# was called on the weak
             next_w = w->link;
@@ -1878,7 +1964,10 @@ bool nonmovingTidyWeaks (struct MarkQueue_ *queue)
         // Otherwise it's a live weak
         ASSERT(w->header.info == &stg_WEAK_info);
 
-        if (nonmovingIsNowAlive(w->key)) {
+        // See Note [Weak pointer processing and the non-moving GC] in
+        // MarkWeak.c
+        bool key_in_nonmoving = Bdescr((StgPtr) w->key)->flags & BF_NONMOVING;
+        if (!key_in_nonmoving || nonmovingIsNowAlive(w->key)) {
             nonmovingMarkLiveWeak(queue, w);
             did_work = true;
 
@@ -1886,7 +1975,7 @@ bool nonmovingTidyWeaks (struct MarkQueue_ *queue)
             *last_w = w->link;
             next_w = w->link;
 
-            // and put it on the weak ptr list
+            // and put it on nonmoving_weak_ptr_list
             w->link = nonmoving_weak_ptr_list;
             nonmoving_weak_ptr_list = w;
         } else {
@@ -1908,7 +1997,8 @@ void nonmovingMarkDeadWeak (struct MarkQueue_ *queue, StgWeak *w)
 
 void nonmovingMarkLiveWeak (struct MarkQueue_ *queue, StgWeak *w)
 {
-    ASSERT(nonmovingClosureMarkedThisCycle((P_)w));
+    ASSERT(nonmovingIsNowAlive((StgClosure *) w));
+    ASSERT(nonmovingIsNowAlive((StgClosure *) w->key));
     markQueuePushClosure_(queue, w->value);
     markQueuePushClosure_(queue, w->finalizer);
     markQueuePushClosure_(queue, w->cfinalizers);
@@ -1922,9 +2012,9 @@ void nonmovingMarkDeadWeaks (struct MarkQueue_ *queue, StgWeak **dead_weaks)
 {
     StgWeak *next_w;
     for (StgWeak *w = nonmoving_old_weak_ptr_list; w; w = next_w) {
-        ASSERT(!nonmovingClosureMarkedThisCycle((P_)(w->key)));
+        ASSERT(!nonmovingIsNowAlive(w->key));
         nonmovingMarkDeadWeak(queue, w);
-        next_w = w ->link;
+        next_w = w->link;
         w->link = *dead_weaks;
         *dead_weaks = w;
     }


=====================================
rts/sm/NonMovingMark.h
=====================================
@@ -111,6 +111,11 @@ typedef struct {
     MarkQueue queue;
 } UpdRemSet;
 
+// How much marking work we are allowed to perform
+// See Note [Sync phase marking budget] in NonMoving.c
+typedef int64_t MarkBudget;
+#define UNLIMITED_MARK_BUDGET INT64_MIN
+
 // Number of blocks to allocate for a mark queue
 #define MARK_QUEUE_BLOCKS 16
 
@@ -135,7 +140,7 @@ extern MarkQueue *current_mark_queue;
 extern bdescr *upd_rem_set_block_list;
 
 
-void nonmovingMarkInitUpdRemSet(void);
+void nonmovingMarkInit(void);
 
 void nonmovingInitUpdRemSet(UpdRemSet *rset);
 void updateRemembSetPushClosure(Capability *cap, StgClosure *p);
@@ -154,8 +159,13 @@ void markQueueAddRoot(MarkQueue* q, StgClosure** root);
 
 void initMarkQueue(MarkQueue *queue);
 void freeMarkQueue(MarkQueue *queue);
-void nonmovingMark(struct MarkQueue_ *restrict queue);
+void nonmovingMark(MarkBudget *budget, struct MarkQueue_ *restrict queue);
+INLINE_HEADER void nonmovingMarkUnlimitedBudget(struct MarkQueue_ *restrict queue) {
+    MarkBudget budget = UNLIMITED_MARK_BUDGET;
+    nonmovingMark(&budget, queue);
+}
 
+void nonmovingMarkWeakPtrList(struct MarkQueue_ *queue);
 bool nonmovingTidyWeaks(struct MarkQueue_ *queue);
 void nonmovingTidyThreads(void);
 void nonmovingMarkDeadWeaks(struct MarkQueue_ *queue, StgWeak **dead_weak_ptr_list);


=====================================
rts/sm/Sanity.c
=====================================
@@ -619,11 +619,12 @@ static void checkNonmovingSegments (struct NonmovingSegment *seg)
 void checkNonmovingHeap (const struct NonmovingHeap *heap)
 {
     for (unsigned int i=0; i < NONMOVING_ALLOCA_CNT; i++) {
-        const struct NonmovingAllocator *alloc = heap->allocators[i];
+        const struct NonmovingAllocator *alloc = &heap->allocators[i];
         checkNonmovingSegments(alloc->filled);
         checkNonmovingSegments(alloc->active);
-        for (unsigned int cap=0; cap < getNumCapabilities(); cap++) {
-            checkNonmovingSegments(alloc->current[cap]);
+        for (unsigned int cap_n=0; cap_n < getNumCapabilities(); cap_n++) {
+            Capability *cap = capabilities[cap_n];
+            checkNonmovingSegments(cap->current_segments[i]);
         }
     }
 }
@@ -1047,11 +1048,12 @@ findMemoryLeak (void)
         markBlocks(nonmoving_compact_objects);
         markBlocks(nonmoving_marked_compact_objects);
         for (i = 0; i < NONMOVING_ALLOCA_CNT; i++) {
-            struct NonmovingAllocator *alloc = nonmovingHeap.allocators[i];
+            struct NonmovingAllocator *alloc = &nonmovingHeap.allocators[i];
             markNonMovingSegments(alloc->filled);
             markNonMovingSegments(alloc->active);
             for (j = 0; j < getNumCapabilities(); j++) {
-                markNonMovingSegments(alloc->current[j]);
+                Capability *cap = capabilities[j];
+                markNonMovingSegments(cap->current_segments[i]);
             }
         }
         markNonMovingSegments(nonmovingHeap.sweep_list);
@@ -1156,23 +1158,18 @@ countNonMovingSegments(struct NonmovingSegment *segs)
     return ret;
 }
 
-static W_
-countNonMovingAllocator(struct NonmovingAllocator *alloc)
-{
-    W_ ret = countNonMovingSegments(alloc->filled)
-           + countNonMovingSegments(alloc->active);
-    for (uint32_t i = 0; i < getNumCapabilities(); ++i) {
-        ret += countNonMovingSegments(alloc->current[i]);
-    }
-    return ret;
-}
-
 static W_
 countNonMovingHeap(struct NonmovingHeap *heap)
 {
     W_ ret = 0;
     for (int alloc_idx = 0; alloc_idx < NONMOVING_ALLOCA_CNT; alloc_idx++) {
-        ret += countNonMovingAllocator(heap->allocators[alloc_idx]);
+        struct NonmovingAllocator *alloc = &heap->allocators[alloc_idx];
+        ret += countNonMovingSegments(alloc->filled);
+        ret += countNonMovingSegments(alloc->active);
+        for (uint32_t c = 0; c < getNumCapabilities(); ++c) {
+            Capability *cap = capabilities[c];
+            ret += countNonMovingSegments(cap->current_segments[alloc_idx]);
+        }
     }
     ret += countNonMovingSegments(heap->sweep_list);
     ret += countNonMovingSegments(heap->free);


=====================================
rts/sm/Storage.c
=====================================
@@ -214,17 +214,14 @@ initStorage (void)
   }
   oldest_gen->to = oldest_gen;
 
-  // Nonmoving heap uses oldest_gen so initialize it after initializing oldest_gen
-  nonmovingInit();
-
 #if defined(THREADED_RTS)
   // nonmovingAddCapabilities allocates segments, which requires taking the gc
   // sync lock, so initialize it before nonmovingAddCapabilities
   initSpinLock(&gc_alloc_block_sync);
 #endif
 
-  if (RtsFlags.GcFlags.useNonmoving)
-      nonmovingAddCapabilities(getNumCapabilities());
+  // Nonmoving heap uses oldest_gen so initialize it after initializing oldest_gen
+  nonmovingInit();
 
   /* The oldest generation has one step. */
   if (RtsFlags.GcFlags.compact || RtsFlags.GcFlags.sweep) {
@@ -320,11 +317,10 @@ void storageAddCapabilities (uint32_t from, uint32_t to)
         }
     }
 
-    // Initialize NonmovingAllocators and UpdRemSets
+    // Initialize non-moving collector
     if (RtsFlags.GcFlags.useNonmoving) {
-        nonmovingAddCapabilities(to);
-        for (i = 0; i < to; ++i) {
-            nonmovingInitUpdRemSet(&capabilities[i]->upd_rem_set);
+        for (i = from; i < to; i++) {
+            nonmovingInitCapability(capabilities[i]);
         }
     }
 


=====================================
testsuite/tests/rts/all.T
=====================================
@@ -339,7 +339,7 @@ test('T10904', [ omit_ways(['ghci']), extra_run_opts('20000') ],
 test('T10728', [extra_run_opts('+RTS -maxN3 -RTS'), only_ways(['threaded2'])],
                compile_and_run, [''])
 
-test('T9405', [when(msys(), expect_broken(12714))], makefile_test, ['T9405'])
+test('T9405', [], makefile_test, ['T9405'])
 
 test('T11788', when(ghc_dynamic(), skip),
               makefile_test, ['T11788'])



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/87c547c7bec75094127d7b73e2fc45650ba98ae2...06a4a65fff9268589270e99762d5d18a64cabc6c

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/87c547c7bec75094127d7b73e2fc45650ba98ae2...06a4a65fff9268589270e99762d5d18a64cabc6c
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230208/005b537d/attachment-0001.html>


More information about the ghc-commits mailing list