[Git][ghc/ghc][wip/T22264] 25 commits: nonmoving: Refactor update remembered set initialization

Ben Gamari (@bgamari) gitlab at gitlab.haskell.org
Fri Nov 18 22:03:09 UTC 2022



Ben Gamari pushed to branch wip/T22264 at Glasgow Haskell Compiler / GHC


Commits:
03ab2abe by Ben Gamari at 2022-11-16T18:51:28-05:00
nonmoving: Refactor update remembered set initialization

This avoids a lock inversion between the storage manager mutex and
the stable pointer table mutex by not dropping the SM_MUTEX in
nonmovingCollect. This requires quite a bit of rejiggering but it
does seem like a better strategy.

- - - - -
5720bb58 by Ben Gamari at 2022-11-16T18:51:28-05:00
nonmoving: Ensure that we aren't holding locks when closing them

TSAN complains about this sort of thing.

- - - - -
39e9e430 by Ben Gamari at 2022-11-16T18:51:28-05:00
nonmoving: Make bitmap accesses atomic

This is a benign race on any sensible hard since these are byte
accesses. Nevertheless, atomic accesses are necessary to satisfy
TSAN.

- - - - -
35505cdf by Ben Gamari at 2022-11-16T18:51:28-05:00
nonmoving: Fix benign race in update remembered set check

Relaxed load is fine here since we will take the lock before looking at
the list.

- - - - -
e0cbbbb9 by Ben Gamari at 2022-11-16T18:51:28-05:00
nonmoving: Fix race in shortcutting

We must use an acquire load to read the info table pointer since if we
find an indirection we must be certain that we see the indirectee.

- - - - -
fc09360d by Ben Gamari at 2022-11-16T18:51:28-05:00
nonmoving: Make free list counter accesses atomic

Since these may race with the allocator(s).

- - - - -
c6a559ac by Ben Gamari at 2022-11-16T19:34:44-05:00
nonmoving: Don't push empty arrays to update remembered set

Previously the write barrier of resizeSmallArray# incorrectly handled
resizing of zero-sized arrays, pushing an invalid pointer to the update
remembered set.

Likely fixes the cause of #22264.

- - - - -
b82e65e2 by Ben Gamari at 2022-11-16T19:34:44-05:00
nonmoving: Deduplicate assertion

- - - - -
c9a1a405 by Ben Gamari at 2022-11-16T19:34:44-05:00
rts/MarkWeak: Add some documentation comments

- - - - -
7290d210 by Ben Gamari at 2022-11-16T19:34:44-05:00
nonmoving: Tighten up weak pointer assertions

Previously we would use `nonmovingClosureMarkedThisCycle` on objects
which may live in large-object blocks, which is not valid. Fix this by
instead using `nonmovingIsNowAlive`, which handles large objects.

- - - - -
000480bd by Ben Gamari at 2022-11-16T19:34:44-05:00
nonmoving: Fix handling of weak pointers

This fixes an interaction between aging and weak pointer handling which
prevented the finalization of some weak pointers. In particular, weak
pointers could have their keys incorrectly marked by the preparatory
collector, preventing their finalization by the subsequent concurrent
collection.

While in the area, we also significantly improve the assertions
regarding weak pointers.

Fixes #22327.

- - - - -
789cea38 by Ben Gamari at 2022-11-16T19:34:44-05:00
nonmoving: Sanity check mutable list

Assert that entries in the nonmoving generation's generational
remembered set (a.k.a. mutable list) live in nonmoving generation.

- - - - -
72ce583b by Ben Gamari at 2022-11-16T19:34:44-05:00
nonmoving: Don't show occupancy if we didn't collect live words

- - - - -
d77dd4c5 by Ben Gamari at 2022-11-16T19:34:44-05:00
rts/BlockAlloc: Allow disabling of internal assertions

These can be quite expensive and it is sometimes useful to compile a
DEBUG RTS without them.

- - - - -
78f20e70 by Ben Gamari at 2022-11-16T19:34:44-05:00
nonmoving: Fix style

- - - - -
6368dcf4 by Ben Gamari at 2022-11-16T19:34:44-05:00
nonmoving: Fix innocuous warning in sweep

- - - - -
8d49a465 by Ben Gamari at 2022-11-16T19:34:44-05:00
Refactor nonmoving weak marking

- - - - -
67f35950 by Ben Gamari at 2022-11-16T19:34:44-05:00
Weak shutdown

- - - - -
86dcca21 by Ben Gamari at 2022-11-16T19:34:44-05:00
nonmoving: Handle new closures in nonmovingIsNowAlive

- - - - -
db3d2b0b by Ben Gamari at 2022-11-16T19:34:44-05:00
NonMovingSweep: Wibbles

- - - - -
845ed124 by Ben Gamari at 2022-11-16T19:34:45-05:00
nonmoving: Fix tracking of FILLED_SWEEPING segments

Previously we only updated the state of the segment at the head of each
allocator's filled list.

- - - - -
3fc717e4 by Ben Gamari at 2022-11-16T19:34:45-05:00
nonmoving: Assert state of swept segments

- - - - -
6f705c26 by Ben Gamari at 2022-11-16T19:34:45-05:00
nonmoving: Sanity check nonmoving large objects and compacts

- - - - -
8286ea6d by Ben Gamari at 2022-11-16T19:34:45-05:00
nonmoving: Post-sweep sanity checking

- - - - -
1fdffe15 by Ben Gamari at 2022-11-18T14:28:37-05:00
Sanity: Mark pinned_object_blocks

- - - - -


16 changed files:

- rts/PrimOps.cmm
- rts/RtsStartup.c
- rts/include/rts/storage/ClosureMacros.h
- rts/sm/BlockAlloc.c
- rts/sm/GC.c
- rts/sm/MarkWeak.c
- rts/sm/NonMoving.c
- rts/sm/NonMoving.h
- rts/sm/NonMovingCensus.c
- rts/sm/NonMovingCensus.h
- rts/sm/NonMovingMark.c
- rts/sm/NonMovingMark.h
- rts/sm/NonMovingShortcut.c
- rts/sm/NonMovingSweep.c
- rts/sm/Sanity.c
- rts/sm/Storage.c


Changes:

=====================================
rts/PrimOps.cmm
=====================================
@@ -294,9 +294,9 @@ stg_shrinkSmallMutableArrayzh ( gcptr mba, W_ new_size )
      p = mba + SIZEOF_StgSmallMutArrPtrs + WDS(new_size);
      end = mba + SIZEOF_StgSmallMutArrPtrs + WDS(StgSmallMutArrPtrs_ptrs(mba));
 again:
-     ccall updateRemembSetPushClosure_(BaseReg "ptr",
-                                       W_[p] "ptr");
      if (p < end) {
+       ccall updateRemembSetPushClosure_(BaseReg "ptr",
+                                         W_[p] "ptr");
        p = p + SIZEOF_W;
        goto again;
      }


=====================================
rts/RtsStartup.c
=====================================
@@ -488,6 +488,7 @@ hs_exit_(bool wait_foreign)
     for (g = 0; g < RtsFlags.GcFlags.generations; g++) {
         runAllCFinalizers(generations[g].weak_ptr_list);
     }
+    runAllCFinalizers(nonmoving_weak_ptr_list);
 
 #if defined(RTS_USER_SIGNALS)
     if (RtsFlags.MiscFlags.install_signal_handlers) {


=====================================
rts/include/rts/storage/ClosureMacros.h
=====================================
@@ -97,6 +97,12 @@ EXTERN_INLINE const StgInfoTable *get_itbl(const StgClosure *c)
     return INFO_PTR_TO_STRUCT(RELAXED_LOAD(&c->header.info));
 }
 
+EXTERN_INLINE const StgInfoTable *get_itbl_acquire(const StgClosure *c);
+EXTERN_INLINE const StgInfoTable *get_itbl_acquire(const StgClosure *c)
+{
+    return INFO_PTR_TO_STRUCT(ACQUIRE_LOAD(&c->header.info));
+}
+
 EXTERN_INLINE const StgRetInfoTable *get_ret_itbl(const StgClosure *c);
 EXTERN_INLINE const StgRetInfoTable *get_ret_itbl(const StgClosure *c)
 {


=====================================
rts/sm/BlockAlloc.c
=====================================
@@ -27,6 +27,16 @@
 
 static void  initMBlock(void *mblock, uint32_t node);
 
+/*
+ * By default the DEBUG RTS is built with block allocator assertions
+ * enabled. However, these are quite expensive and consequently it can
+ * sometimes be useful to disable them if debugging an issue known to be
+ * elsewhere
+ */
+#if defined(DEBUG)
+#define BLOCK_ALLOC_DEBUG
+#endif
+
 /* -----------------------------------------------------------------------------
 
   Implementation notes
@@ -246,7 +256,7 @@ initGroup(bdescr *head)
       last->link = head;
   }
 
-#if defined(DEBUG)
+#if defined(BLOCK_ALLOC_DEBUG)
   for (uint32_t i=0; i < head->blocks; i++) {
       head[i].flags = 0;
   }
@@ -618,7 +628,7 @@ allocAlignedGroupOnNode (uint32_t node, W_ n)
 
     ASSERT(slop_low_blocks + slop_high_blocks + n == num_blocks);
 
-#if defined(DEBUG)
+#if defined(BLOCK_ALLOC_DEBUG)
     checkFreeListSanity();
     W_ free_before = countFreeList();
 #endif
@@ -628,7 +638,7 @@ allocAlignedGroupOnNode (uint32_t node, W_ n)
         ASSERT(countBlocks(bd) == num_blocks - slop_low_blocks);
     }
 
-#if defined(DEBUG)
+#if defined(BLOCK_ALLOC_DEBUG)
     ASSERT(countFreeList() == free_before + slop_low_blocks);
     checkFreeListSanity();
 #endif
@@ -636,7 +646,7 @@ allocAlignedGroupOnNode (uint32_t node, W_ n)
     // At this point the bd should be aligned, but we may have slop on the high side
     ASSERT((uintptr_t)bd->start % group_size == 0);
 
-#if defined(DEBUG)
+#if defined(BLOCK_ALLOC_DEBUG)
     free_before = countFreeList();
 #endif
 
@@ -645,7 +655,7 @@ allocAlignedGroupOnNode (uint32_t node, W_ n)
         ASSERT(bd->blocks == n);
     }
 
-#if defined(DEBUG)
+#if defined(BLOCK_ALLOC_DEBUG)
     ASSERT(countFreeList() == free_before + slop_high_blocks);
     checkFreeListSanity();
 #endif
@@ -928,7 +938,7 @@ freeGroup(bdescr *p)
 
   ASSERT(RELAXED_LOAD(&p->free) != (P_)-1);
 
-#if defined(DEBUG)
+#if defined(BLOCK_ALLOC_DEBUG)
   for (uint32_t i=0; i < p->blocks; i++) {
       p[i].flags = 0;
   }


=====================================
rts/sm/GC.c
=====================================
@@ -844,11 +844,9 @@ GarbageCollect (uint32_t collect_gen,
   // Flush the update remembered sets. See Note [Eager update remembered set
   // flushing] in NonMovingMark.c
   if (RtsFlags.GcFlags.useNonmoving) {
-      RELEASE_SM_LOCK;
       for (n = 0; n < getNumCapabilities(); n++) {
-          nonmovingAddUpdRemSetBlocks(&getCapability(n)->upd_rem_set.queue);
+          nonmovingAddUpdRemSetBlocks(&getCapability(n)->upd_rem_set);
       }
-      ACQUIRE_SM_LOCK;
   }
 
   // Mark and sweep the oldest generation.
@@ -869,8 +867,6 @@ GarbageCollect (uint32_t collect_gen,
       // old_weak_ptr_list should be empty.
       ASSERT(oldest_gen->old_weak_ptr_list == NULL);
 
-      // we may need to take the lock to allocate mark queue blocks
-      RELEASE_SM_LOCK;
       // dead_weak_ptr_list contains weak pointers with dead keys. Those need to
       // be kept alive because we'll use them in finalizeSchedulers(). Similarly
       // resurrected_threads are also going to be used in resurrectedThreads()
@@ -880,10 +876,9 @@ GarbageCollect (uint32_t collect_gen,
 #if !defined(THREADED_RTS)
       // In the non-threaded runtime this is the only time we push to the
       // upd_rem_set
-      nonmovingAddUpdRemSetBlocks(&gct->cap->upd_rem_set.queue);
+      nonmovingAddUpdRemSetBlocks(&gct->cap->upd_rem_set);
 #endif
       nonmovingCollect(&dead_weak_ptr_list, &resurrected_threads);
-      ACQUIRE_SM_LOCK;
   }
 
   // Update the max size of older generations after a major GC:


=====================================
rts/sm/MarkWeak.c
=====================================
@@ -50,7 +50,7 @@
 
    - weak_stage == WeakPtrs
 
-     We process all the weak pointers whos keys are alive (evacuate
+     We process all the weak pointers whose keys are alive (evacuate
      their values and finalizers), and repeat until we can find no new
      live keys.  If no live keys are found in this pass, then we
      evacuate the finalizers of all the dead weak pointers in order to
@@ -82,12 +82,46 @@ static bool tidyWeakList (generation *gen);
 static bool resurrectUnreachableThreads (generation *gen, StgTSO **resurrected_threads);
 static void    tidyThreadList (generation *gen);
 
+/*
+ * Note [Weak pointer processing and the non-moving GC]
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * When using the non-moving GC we defer weak pointer processing
+ * until the concurrent marking phase as weaks in the non-moving heap may be
+ * keyed on objects living in the non-moving generation. To accomplish this
+ * initWeakForGC keeps all weak pointers on oldest_gen->weak_ptr_list, where
+ * nonmovingCollect will find them. From there they will be moved to
+ * nonmoving_old_weak_ptr_list. During the mark loop we will move weaks with
+ * reachable keys to nonmoving_weak_ptr_list. At the end of concurrent marking
+ * we tidy the weak list (in nonmovingTidyWeakList) and perform another set of
+ * marking as necessary, just as is done in tidyWeakList.
+ *
+ * Note that this treatment takes advantage of the fact that we usually need
+ * not worry about Weak#s living in the non-moving heap but being keyed on an
+ * object in the moving heap since the Weak# must be strictly older than the
+ * key. Such objects would otherwise pose a problem since the non-moving
+ * collector would be unable to safely determine the liveness of the key.
+ * In the rare case that we *do* see such a key (e.g. in the case of a
+ * pinned ByteArray# living in a partially-filled accumulator block)
+ * the nonmoving collector assumes that it is live.
+ *
+ */
+
+/*
+ * Prepare the weak object lists for GC. Specifically, reset weak_stage
+ * and move all generations' `weak_ptr_list`s to `old_weak_ptr_list`.
+ * Weaks with live keys will later be moved back to `weak_ptr_list` by
+ * `tidyWeakList`.
+ */
 void
 initWeakForGC(void)
 {
-    uint32_t g;
+    uint32_t oldest = N;
+    if (RtsFlags.GcFlags.useNonmoving && N == oldest_gen->no) {
+        // See Note [Weak pointer processing and the non-moving GC].
+        oldest = oldest_gen->no - 1;
+    }
 
-    for (g = 0; g <= N; g++) {
+    for (uint32_t g = 0; g <= oldest; g++) {
         generation *gen = &generations[g];
         gen->old_weak_ptr_list = gen->weak_ptr_list;
         gen->weak_ptr_list = NULL;
@@ -96,6 +130,14 @@ initWeakForGC(void)
     weak_stage = WeakThreads;
 }
 
+/*
+ * Walk the weak pointer lists after having finished a round of scavenging,
+ * tidying the weak (and possibly thread) lists (depending upon the current
+ * weak_stage).
+ *
+ * Returns true if new live weak pointers were found, implying that another
+ * round of scavenging is necessary.
+ */
 bool
 traverseWeakPtrList(StgWeak **dead_weak_ptr_list, StgTSO **resurrected_threads)
 {
@@ -182,6 +224,11 @@ traverseWeakPtrList(StgWeak **dead_weak_ptr_list, StgTSO **resurrected_threads)
   }
 }
 
+/*
+ * Deal with weak pointers with unreachable keys after GC has concluded.
+ * This means marking the finalizer (and possibly value) in preparation for
+ * later finalization.
+ */
 static void collectDeadWeakPtrs (generation *gen, StgWeak **dead_weak_ptr_list)
 {
     StgWeak *w, *next_w;
@@ -198,6 +245,10 @@ static void collectDeadWeakPtrs (generation *gen, StgWeak **dead_weak_ptr_list)
     }
 }
 
+/*
+ * Deal with threads left on the old_threads list after GC has concluded,
+ * moving them onto the resurrected_threads list where appropriate.
+ */
 static bool resurrectUnreachableThreads (generation *gen, StgTSO **resurrected_threads)
 {
     StgTSO *t, *tmp, *next;
@@ -233,8 +284,21 @@ static bool resurrectUnreachableThreads (generation *gen, StgTSO **resurrected_t
     return flag;
 }
 
+/*
+ * Walk over the `old_weak_ptr_list` of the given generation and:
+ *
+ *  - remove any DEAD_WEAKs
+ *  - move any weaks with reachable keys to the `weak_ptr_list` of the
+ *    appropriate to-space and mark the weak's value and finalizer.
+ */
 static bool tidyWeakList(generation *gen)
 {
+    if (RtsFlags.GcFlags.useNonmoving && gen == oldest_gen) {
+        // See Note [Weak pointer processing and the non-moving GC].
+        ASSERT(gen->old_weak_ptr_list == NULL);
+        return false;
+    }
+
     StgWeak *w, **last_w, *next_w;
     const StgInfoTable *info;
     StgClosure *new;
@@ -322,6 +386,10 @@ static bool tidyWeakList(generation *gen)
     return flag;
 }
 
+/*
+ * Walk over the `old_threads` list of the given generation and move any
+ * reachable threads onto the `threads` list.
+ */
 static void tidyThreadList (generation *gen)
 {
     StgTSO *t, *tmp, *next, **prev;
@@ -381,6 +449,10 @@ static void checkWeakPtrSanity(StgWeak *hd, StgWeak *tl)
 }
 #endif
 
+/*
+ * Traverse the capabilities' local new-weak-pointer lists at the beginning of
+ * GC and move them to the nursery's weak_ptr_list.
+ */
 void collectFreshWeakPtrs()
 {
     uint32_t i;


=====================================
rts/sm/NonMoving.c
=====================================
@@ -26,7 +26,10 @@
 #include "NonMovingCensus.h"
 #include "StablePtr.h" // markStablePtrTable
 #include "Schedule.h" // markScheduler
-#include "Weak.h" // dead_weak_ptr_list
+#include "Sanity.h"
+#include "Weak.h" // scheduleFinalizers
+
+#define NONCONCURRENT_SWEEP
 
 struct NonmovingHeap nonmovingHeap;
 
@@ -248,6 +251,9 @@ Mutex concurrent_coll_finished_lock;
  *    how we use the DIRTY flags associated with MUT_VARs and TVARs to improve
  *    barrier efficiency.
  *
+ *  - Note [Weak pointer processing and the non-moving GC] (MarkWeak.c) describes
+ *    how weak pointers are handled when the non-moving GC is in use.
+ *
  * [ueno 2016]:
  *   Katsuhiro Ueno and Atsushi Ohori. 2016. A fully concurrent garbage
  *   collector for functional programs on multicore processors. SIGPLAN Not. 51,
@@ -286,8 +292,8 @@ Mutex concurrent_coll_finished_lock;
  * was (unsurprisingly) also found to result in significant amounts of
  * unnecessary copying.
  *
- * Consequently, we now allow aging. Aging allows the preparatory GC leading up
- * to a major collection to evacuate some objects into the young generation.
+ * Consequently, we now allow "aging", allows the preparatory GC leading up
+ * to a major collection to evacuate objects into the young generation.
  * However, this introduces the following tricky case that might arise after
  * we have finished the preparatory GC:
  *
@@ -296,6 +302,7 @@ Mutex concurrent_coll_finished_lock;
  *                    ┆
  *      B ←────────────── A ←─────────────── root
  *      │             ┆     ↖─────────────── gen1 mut_list
+ *      │             ┆
  *      ╰───────────────→ C
  *                    ┆
  *
@@ -336,8 +343,9 @@ Mutex concurrent_coll_finished_lock;
  * The implementation details of this are described in Note [Non-moving GC:
  * Marking evacuated objects] in Evac.c.
  *
- * Note [Deadlock detection under nonmoving collector]
- * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * Note [Deadlock detection under the non-moving collector]
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  * In GHC the garbage collector is responsible for identifying deadlocked
  * programs. Providing for this responsibility is slightly tricky in the
  * non-moving collector due to the existence of aging. In particular, the
@@ -523,7 +531,7 @@ static void nonmovingInitSegment(struct NonmovingSegment *seg, uint8_t log_block
 void nonmovingPushFreeSegment(struct NonmovingSegment *seg)
 {
     // See Note [Live data accounting in nonmoving collector].
-    if (nonmovingHeap.n_free > NONMOVING_MAX_FREE) {
+    if (RELAXED_LOAD(&nonmovingHeap.n_free) > NONMOVING_MAX_FREE) {
         bdescr *bd = Bdescr((StgPtr) seg);
         ACQUIRE_SM_LOCK;
         ASSERT(oldest_gen->n_blocks >= bd->blocks);
@@ -754,7 +762,7 @@ void nonmovingStop(void)
                    "waiting for nonmoving collector thread to terminate");
         ACQUIRE_LOCK(&concurrent_coll_finished_lock);
         waitCondition(&concurrent_coll_finished, &concurrent_coll_finished_lock);
-        ACQUIRE_LOCK(&nonmoving_collection_mutex);
+        RELEASE_LOCK(&concurrent_coll_finished_lock);
     }
 #endif
 }
@@ -767,6 +775,9 @@ void nonmovingExit(void)
     nonmovingStop();
 
 #if defined(THREADED_RTS)
+    ACQUIRE_LOCK(&nonmoving_collection_mutex);
+    RELEASE_LOCK(&nonmoving_collection_mutex);
+
     closeMutex(&concurrent_coll_finished_lock);
     closeCondition(&concurrent_coll_finished);
     closeMutex(&nonmoving_collection_mutex);
@@ -893,37 +904,6 @@ static void nonmovingPrepareMark(void)
 #endif
 }
 
-// Mark weak pointers in the non-moving heap. They'll either end up in
-// dead_weak_ptr_list or stay in weak_ptr_list. Either way they need to be kept
-// during sweep. See `MarkWeak.c:markWeakPtrList` for the moving heap variant
-// of this.
-static void nonmovingMarkWeakPtrList(MarkQueue *mark_queue, StgWeak *dead_weak_ptr_list)
-{
-    for (StgWeak *w = oldest_gen->weak_ptr_list; w; w = w->link) {
-        markQueuePushClosure_(mark_queue, (StgClosure*)w);
-        // Do not mark finalizers and values here, those fields will be marked
-        // in `nonmovingMarkDeadWeaks` (for dead weaks) or
-        // `nonmovingTidyWeaks` (for live weaks)
-    }
-
-    // We need to mark dead_weak_ptr_list too. This is subtle:
-    //
-    // - By the beginning of this GC we evacuated all weaks to the non-moving
-    //   heap (in `markWeakPtrList`)
-    //
-    // - During the scavenging of the moving heap we discovered that some of
-    //   those weaks are dead and moved them to `dead_weak_ptr_list`. Note that
-    //   because of the fact above _all weaks_ are in the non-moving heap at
-    //   this point.
-    //
-    // - So, to be able to traverse `dead_weak_ptr_list` and run finalizers we
-    //   need to mark it.
-    for (StgWeak *w = dead_weak_ptr_list; w; w = w->link) {
-        markQueuePushClosure_(mark_queue, (StgClosure*)w);
-        nonmovingMarkDeadWeak(mark_queue, w);
-    }
-}
-
 void nonmovingCollect(StgWeak **dead_weaks, StgTSO **resurrected_threads)
 {
 #if defined(THREADED_RTS)
@@ -957,12 +937,19 @@ void nonmovingCollect(StgWeak **dead_weaks, StgTSO **resurrected_threads)
                 getCapability(n), true/*don't mark sparks*/);
     }
     markScheduler((evac_fn)markQueueAddRoot, mark_queue);
-    nonmovingMarkWeakPtrList(mark_queue, *dead_weaks);
     markStablePtrTable((evac_fn)markQueueAddRoot, mark_queue);
 
+    // The dead weak pointer list shouldn't contain any weaks in the
+    // nonmoving heap
+#if defined(DEBUG)
+    for (StgWeak *w = *dead_weaks; w; w = w->link) {
+        ASSERT(Bdescr((StgPtr) w)->gen != oldest_gen);
+    }
+#endif
+
     // Mark threads resurrected during moving heap scavenging
     for (StgTSO *tso = *resurrected_threads; tso != END_TSO_QUEUE; tso = tso->global_link) {
-        markQueuePushClosure_(mark_queue, (StgClosure*)tso);
+        markQueuePushClosureGC(mark_queue, (StgClosure*)tso);
     }
     trace(TRACE_nonmoving_gc, "Finished marking roots for nonmoving GC");
 
@@ -985,8 +972,23 @@ void nonmovingCollect(StgWeak **dead_weaks, StgTSO **resurrected_threads)
     // alive).
     ASSERT(oldest_gen->old_weak_ptr_list == NULL);
     ASSERT(nonmoving_old_weak_ptr_list == NULL);
-    nonmoving_old_weak_ptr_list = oldest_gen->weak_ptr_list;
-    oldest_gen->weak_ptr_list = NULL;
+    {
+        // Move both oldest_gen->weak_ptr_list and nonmoving_weak_ptr_list to
+        // nonmoving_old_weak_ptr_list
+        StgWeak **weaks = &oldest_gen->weak_ptr_list;
+        uint32_t n = 0;
+        while (*weaks) {
+            weaks = &(*weaks)->link;
+            n++;
+        }
+        debugTrace(DEBUG_nonmoving_gc, "%d new nonmoving weaks", n);
+        *weaks = nonmoving_weak_ptr_list;
+        nonmoving_old_weak_ptr_list = oldest_gen->weak_ptr_list;
+        nonmoving_weak_ptr_list = NULL;
+        oldest_gen->weak_ptr_list = NULL;
+        // At this point all weaks in the nonmoving generation are on
+        // nonmoving_old_weak_ptr_list
+    }
     trace(TRACE_nonmoving_gc, "Finished nonmoving GC preparation");
 
     // We are now safe to start concurrent marking
@@ -1050,7 +1052,6 @@ static void* nonmovingConcurrentMark(void *data)
     return NULL;
 }
 
-// TODO: Not sure where to put this function.
 // Append w2 to the end of w1.
 static void appendWeakList( StgWeak **w1, StgWeak *w2 )
 {
@@ -1077,19 +1078,23 @@ static void nonmovingMark_(MarkQueue *mark_queue, StgWeak **dead_weaks, StgTSO *
             while (true) {
                 // Set snapshot
                 nonmovingSegmentInfo(seg)->next_free_snap = seg->next_free;
+                SET_SEGMENT_STATE(seg, FILLED_SWEEPING);
                 n_filled++;
-                if (seg->link)
+                if (seg->link) {
                     seg = seg->link;
-                else
+                } else {
                     break;
+                }
             }
             // add filled segments to sweep_list
-            SET_SEGMENT_STATE(seg, FILLED_SWEEPING);
             seg->link = nonmovingHeap.sweep_list;
             nonmovingHeap.sweep_list = filled;
         }
     }
 
+    // Mark Weak#s
+    nonmovingMarkWeakPtrList(mark_queue);
+
     // Do concurrent marking; most of the heap will get marked here.
     nonmovingMarkThreadsWeaks(mark_queue);
 
@@ -1100,21 +1105,13 @@ static void nonmovingMark_(MarkQueue *mark_queue, StgWeak **dead_weaks, StgTSO *
     if (getSchedState() > SCHED_RUNNING) {
         // Note that we break our invariants here and leave segments in
         // nonmovingHeap.sweep_list, don't free nonmoving_large_objects etc.
-        // However because we won't be running mark-sweep in the final GC this
+        // However because we won't be running sweep in the final GC this
         // is OK.
-
-        // This is a RTS shutdown so we need to move our copy (snapshot) of
-        // weaks (nonmoving_old_weak_ptr_list and nonmoving_weak_ptr_list) to
-        // oldest_gen->threads to be able to run C finalizers in hs_exit_. Note
-        // that there may be more weaks added to oldest_gen->threads since we
-        // started mark, so we need to append our list to the tail of
-        // oldest_gen->threads.
-        appendWeakList(&nonmoving_old_weak_ptr_list, nonmoving_weak_ptr_list);
-        appendWeakList(&oldest_gen->weak_ptr_list, nonmoving_old_weak_ptr_list);
-        // These lists won't be used again so this is not necessary, but still
-        nonmoving_old_weak_ptr_list = NULL;
-        nonmoving_weak_ptr_list = NULL;
-
+        //
+        // However, we must move any weak pointers remaining on
+        // nonmoving_old_weak_ptr_list back to nonmoving_weak_ptr_list
+        // such that their C finalizers can be run by hs_exit_.
+        appendWeakList(&nonmoving_weak_ptr_list, nonmoving_old_weak_ptr_list);
         goto finish;
     }
 
@@ -1186,15 +1183,8 @@ static void nonmovingMark_(MarkQueue *mark_queue, StgWeak **dead_weaks, StgTSO *
         nonmoving_old_threads = END_TSO_QUEUE;
     }
 
-    {
-        StgWeak **weaks = &oldest_gen->weak_ptr_list;
-        while (*weaks) {
-            weaks = &(*weaks)->link;
-        }
-        *weaks = nonmoving_weak_ptr_list;
-        nonmoving_weak_ptr_list = NULL;
-        nonmoving_old_weak_ptr_list = NULL;
-    }
+    nonmoving_weak_ptr_list = nonmoving_old_weak_ptr_list;
+    nonmoving_old_weak_ptr_list = NULL;
 
     // Prune spark lists
     // See Note [Spark management under the nonmoving collector].
@@ -1205,7 +1195,7 @@ static void nonmovingMark_(MarkQueue *mark_queue, StgWeak **dead_weaks, StgTSO *
 #endif
 
     // Everything has been marked; allow the mutators to proceed
-#if defined(THREADED_RTS)
+#if defined(THREADED_RTS) && !defined(NONCONCURRENT_SWEEP)
     nonmoving_write_barrier_enabled = false;
     nonmovingFinishFlush(task);
 #endif
@@ -1237,13 +1227,22 @@ static void nonmovingMark_(MarkQueue *mark_queue, StgWeak **dead_weaks, StgTSO *
     traceConcSweepEnd();
 #if defined(DEBUG)
     if (RtsFlags.DebugFlags.nonmoving_gc)
-        nonmovingPrintAllocatorCensus();
+        nonmovingPrintAllocatorCensus(true);
 #endif
 #if defined(TRACING)
     if (RtsFlags.TraceFlags.nonmoving_gc)
         nonmovingTraceAllocatorCensus();
 #endif
 
+#if defined(THREADED_RTS) && defined(NONCONCURRENT_SWEEP)
+#if defined(DEBUG)
+    checkNonmovingHeap(&nonmovingHeap);
+    checkSanity(true, true);
+#endif
+    nonmoving_write_barrier_enabled = false;
+    nonmovingFinishFlush(task);
+#endif
+
     // TODO: Remainder of things done by GarbageCollect (update stats)
 
 #if defined(THREADED_RTS)


=====================================
rts/sm/NonMoving.h
=====================================
@@ -45,8 +45,10 @@ enum NonmovingSegmentState {
 };
 
 #define SET_SEGMENT_STATE(seg, st) RELAXED_STORE(&(seg)->state, (st))
+#define ASSERT_SEGMENT_STATE(seg, st) ASSERT(RELAXED_LOAD(&(seg)->state) == (st))
 #else
 #define SET_SEGMENT_STATE(_seg,_st)
+#define ASSERT_SEGMENT_STATE(_seg, _st)
 #endif
 
 // A non-moving heap segment
@@ -256,15 +258,22 @@ INLINE_HEADER struct NonmovingSegment *nonmovingGetSegment_unchecked(StgPtr p)
     return (struct NonmovingSegment *) (((uintptr_t) p) & mask);
 }
 
+INLINE_HEADER bool nonmovingIsInSegment(StgPtr p)
+{
+    bdescr *bd = Bdescr(p);
+    return HEAP_ALLOCED_GC(p) &&
+        (bd->flags & BF_NONMOVING) &&
+        !(bd->flags & BF_LARGE);
+}
+
 INLINE_HEADER struct NonmovingSegment *nonmovingGetSegment(StgPtr p)
 {
-    ASSERT(HEAP_ALLOCED_GC(p) && (Bdescr(p)->flags & BF_NONMOVING));
+    ASSERT(nonmovingIsInSegment(p));
     return nonmovingGetSegment_unchecked(p);
 }
 
 INLINE_HEADER nonmoving_block_idx nonmovingGetBlockIdx(StgPtr p)
 {
-    ASSERT(HEAP_ALLOCED_GC(p) && (Bdescr(p)->flags & BF_NONMOVING));
     struct NonmovingSegment *seg = nonmovingGetSegment(p);
     ptrdiff_t blk0 = (ptrdiff_t)nonmovingSegmentGetBlock(seg, 0);
     ptrdiff_t offset = (ptrdiff_t)p - blk0;
@@ -276,12 +285,12 @@ extern uint8_t nonmovingMarkEpoch;
 
 INLINE_HEADER void nonmovingSetMark(struct NonmovingSegment *seg, nonmoving_block_idx i)
 {
-    seg->bitmap[i] = nonmovingMarkEpoch;
+    RELAXED_STORE(&seg->bitmap[i], nonmovingMarkEpoch);
 }
 
 INLINE_HEADER uint8_t nonmovingGetMark(struct NonmovingSegment *seg, nonmoving_block_idx i)
 {
-    return seg->bitmap[i];
+    return RELAXED_LOAD(&seg->bitmap[i]);
 }
 
 INLINE_HEADER void nonmovingSetClosureMark(StgPtr p)
@@ -289,20 +298,17 @@ INLINE_HEADER void nonmovingSetClosureMark(StgPtr p)
     nonmovingSetMark(nonmovingGetSegment(p), nonmovingGetBlockIdx(p));
 }
 
-// TODO: Audit the uses of these
-/* Was the given closure marked this major GC cycle? */
-INLINE_HEADER bool nonmovingClosureMarkedThisCycle(StgPtr p)
+INLINE_HEADER uint8_t nonmovingGetClosureMark(StgPtr p)
 {
     struct NonmovingSegment *seg = nonmovingGetSegment(p);
     nonmoving_block_idx blk_idx = nonmovingGetBlockIdx(p);
-    return nonmovingGetMark(seg, blk_idx) == nonmovingMarkEpoch;
+    return nonmovingGetMark(seg, blk_idx);
 }
 
-INLINE_HEADER bool nonmovingClosureMarked(StgPtr p)
+/* Was the given closure marked this major GC cycle? */
+INLINE_HEADER bool nonmovingClosureMarkedThisCycle(StgPtr p)
 {
-    struct NonmovingSegment *seg = nonmovingGetSegment(p);
-    nonmoving_block_idx blk_idx = nonmovingGetBlockIdx(p);
-    return nonmovingGetMark(seg, blk_idx) != 0;
+    return nonmovingGetClosureMark(p) == nonmovingMarkEpoch;
 }
 
 // Can be called during a major collection to determine whether a particular


=====================================
rts/sm/NonMovingCensus.c
=====================================
@@ -23,7 +23,7 @@
 static struct NonmovingAllocCensus
 nonmovingAllocatorCensus_(struct NonmovingAllocator *alloc, bool collect_live_words)
 {
-    struct NonmovingAllocCensus census = {0, 0, 0, 0};
+    struct NonmovingAllocCensus census = {collect_live_words, 0, 0, 0, 0};
 
     for (struct NonmovingSegment *seg = alloc->filled;
          seg != NULL;
@@ -47,7 +47,7 @@ nonmovingAllocatorCensus_(struct NonmovingAllocator *alloc, bool collect_live_wo
         census.n_active_segs++;
         unsigned int n = nonmovingSegmentBlockCount(seg);
         for (unsigned int i=0; i < n; i++) {
-            if (nonmovingGetMark(seg, i)) {
+            if (nonmovingGetMark(seg, i) == nonmovingMarkEpoch) {
                 StgClosure *c = (StgClosure *) nonmovingSegmentGetBlock(seg, i);
                 if (collect_live_words)
                     census.n_live_words += closure_sizeW(c);
@@ -88,28 +88,51 @@ nonmovingAllocatorCensus(struct NonmovingAllocator *alloc)
 }
 
 
-void nonmovingPrintAllocatorCensus()
+static void print_alloc_census(int i, struct NonmovingAllocCensus census)
 {
-    if (!RtsFlags.GcFlags.useNonmoving)
-        return;
+    uint32_t blk_size = 1 << (i + NONMOVING_ALLOCA0);
+    int sz_min = 1 << (i + NONMOVING_ALLOCA0 - 1);
+    int sz_max = 1 << (i + NONMOVING_ALLOCA0);
+    (void) sz_min; (void) sz_max;
 
-    for (int i=0; i < NONMOVING_ALLOCA_CNT; i++) {
-        struct NonmovingAllocCensus census =
-            nonmovingAllocatorCensus(nonmovingHeap.allocators[i]);
-
-        uint32_t blk_size = 1 << (i + NONMOVING_ALLOCA0);
+    if (census.collected_live_words) {
         // We define occupancy as the fraction of space that is used for useful
         // data (that is, live and not slop).
         double occupancy = 100.0 * census.n_live_words * sizeof(W_)
             / (census.n_live_blocks * blk_size);
         if (census.n_live_blocks == 0) occupancy = 100;
         (void) occupancy; // silence warning if !DEBUG
-        debugTrace(DEBUG_nonmoving_gc, "Allocator %d (%d bytes - %d bytes): "
-                   "%d active segs, %d filled segs, %d live blocks, %d live words "
-                   "(%2.1f%% occupancy)",
-                   i, 1 << (i + NONMOVING_ALLOCA0 - 1), 1 << (i + NONMOVING_ALLOCA0),
-                   census.n_active_segs, census.n_filled_segs, census.n_live_blocks, census.n_live_words,
+        debugTrace(DEBUG_nonmoving_gc,
+                   "Allocator %d (%d bytes - %d bytes): "
+                   "%"PRIu32" active segs, %"PRIu32" filled segs, %"PRIu32" live blocks, "
+                   "%"PRIu32" live words (%2.1f%% occupancy)",
+                   i, sz_min, sz_max,
+                   census.n_active_segs,
+                   census.n_filled_segs,
+                   census.n_live_blocks,
+                   census.n_live_words,
                    occupancy);
+    } else {
+        debugTrace(DEBUG_nonmoving_gc,
+                   "Allocator %d (%d bytes - %d bytes): "
+                   "%"PRIu32" active segs, %"PRIu32" filled segs, %"PRIu32" live blocks",
+                   i, sz_min, sz_max,
+                   census.n_active_segs,
+                   census.n_filled_segs,
+                   census.n_live_blocks);
+    }
+}
+
+void nonmovingPrintAllocatorCensus(bool collect_live_words)
+{
+    if (!RtsFlags.GcFlags.useNonmoving)
+        return;
+
+    for (int i=0; i < NONMOVING_ALLOCA_CNT; i++) {
+        struct NonmovingAllocCensus census =
+            nonmovingAllocatorCensus_(nonmovingHeap.allocators[i], collect_live_words);
+
+        print_alloc_census(i, census);
     }
 }
 


=====================================
rts/sm/NonMovingCensus.h
=====================================
@@ -11,6 +11,7 @@
 #include "NonMoving.h"
 
 struct NonmovingAllocCensus {
+    bool collected_live_words;
     uint32_t n_active_segs;
     uint32_t n_filled_segs;
     uint32_t n_live_blocks;
@@ -24,5 +25,5 @@ nonmovingAllocatorCensusWithWords(struct NonmovingAllocator *alloc);
 struct NonmovingAllocCensus
 nonmovingAllocatorCensus(struct NonmovingAllocator *alloc);
 
-void nonmovingPrintAllocatorCensus(void);
+void nonmovingPrintAllocatorCensus(bool collect_live_words);
 void nonmovingTraceAllocatorCensus(void);


=====================================
rts/sm/NonMovingMark.c
=====================================
@@ -27,6 +27,8 @@
 #include "sm/Storage.h"
 #include "CNF.h"
 
+static void nonmovingResetUpdRemSetQueue (MarkQueue *rset);
+static void nonmovingResetUpdRemSet (UpdRemSet *rset);
 static bool check_in_nonmoving_heap(StgClosure *p);
 static void mark_closure (MarkQueue *queue, const StgClosure *p, StgClosure **origin);
 static void trace_tso (MarkQueue *queue, StgTSO *tso);
@@ -261,17 +263,9 @@ static uint32_t markQueueLength(MarkQueue *q);
 #endif
 static void init_mark_queue_(MarkQueue *queue);
 
-/* Transfers the given capability's update-remembered set to the global
- * remembered set.
- *
- * Really the argument type should be UpdRemSet* but this would be rather
- * inconvenient without polymorphism.
- */
-void nonmovingAddUpdRemSetBlocks(MarkQueue *rset)
+static void nonmovingAddUpdRemSetBlocks_(MarkQueue *rset)
 {
-    if (markQueueIsEmpty(rset)) return;
-
-    // find the tail of the queue
+    // find the tail of the remembered set mark queue
     bdescr *start = rset->blocks;
     bdescr *end = start;
     while (end->link != NULL)
@@ -282,14 +276,45 @@ void nonmovingAddUpdRemSetBlocks(MarkQueue *rset)
     end->link = upd_rem_set_block_list;
     upd_rem_set_block_list = start;
     RELEASE_LOCK(&upd_rem_set_lock);
+}
 
-    // Reset remembered set
+/*
+ * Transfers the given capability's update-remembered set to the global
+ * remembered set.
+ *
+ * Really the argument type should be UpdRemSet* but this would be rather
+ * inconvenient without polymorphism.
+ */
+static void nonmovingAddUpdRemSetBlocks_lock(MarkQueue *rset)
+{
+    if (markQueueIsEmpty(rset)) return;
+
+    nonmovingAddUpdRemSetBlocks_(rset);
+    // Reset the state of the remembered set.
     ACQUIRE_SM_LOCK;
     init_mark_queue_(rset);
     rset->is_upd_rem_set = true;
     RELEASE_SM_LOCK;
 }
 
+/*
+ * Transfers the given capability's update-remembered set to the global
+ * remembered set.
+ *
+ * Really the argument type should be UpdRemSet* but this would be rather
+ * inconvenient without polymorphism.
+ *
+ * Caller must hold SM_LOCK.
+ */
+void nonmovingAddUpdRemSetBlocks(UpdRemSet *rset)
+{
+    if (markQueueIsEmpty(&rset->queue)) return;
+
+    nonmovingAddUpdRemSetBlocks_(&rset->queue);
+    init_mark_queue_(&rset->queue);
+    rset->queue.is_upd_rem_set = true;
+}
+
 #if defined(THREADED_RTS)
 /* Called by capabilities to flush their update remembered sets when
  * synchronising with the non-moving collector as it transitions from mark to
@@ -301,7 +326,7 @@ void nonmovingFlushCapUpdRemSetBlocks(Capability *cap)
                "Capability %d flushing update remembered set: %d",
                cap->no, markQueueLength(&cap->upd_rem_set.queue));
     traceConcUpdRemSetFlush(cap);
-    nonmovingAddUpdRemSetBlocks(&cap->upd_rem_set.queue);
+    nonmovingAddUpdRemSetBlocks_lock(&cap->upd_rem_set.queue);
     atomic_inc(&upd_rem_set_flush_count, 1);
     signalCondition(&upd_rem_set_flushed_cond);
     // After this mutation will remain suspended until nonmovingFinishFlush
@@ -399,7 +424,7 @@ void nonmovingFinishFlush(Task *task)
 {
     // See Note [Unintentional marking in resurrectThreads]
     for (uint32_t i = 0; i < getNumCapabilities(); i++) {
-        reset_upd_rem_set(&getCapability(i)->upd_rem_set);
+        nonmovingResetUpdRemSet(&getCapability(i)->upd_rem_set);
     }
     // Also reset upd_rem_set_block_list in case some of the UpdRemSets were
     // filled and we flushed them.
@@ -424,7 +449,8 @@ push (MarkQueue *q, const MarkQueueEnt *ent)
     if (q->top->head == MARK_QUEUE_BLOCK_ENTRIES) {
         // Yes, this block is full.
         if (q->is_upd_rem_set) {
-            nonmovingAddUpdRemSetBlocks(q);
+            // Flush the block to the global update remembered set
+            nonmovingAddUpdRemSetBlocks_lock(q);
         } else {
             // allocate a fresh block.
             ACQUIRE_SM_LOCK;
@@ -780,7 +806,7 @@ void markQueuePushClosure (MarkQueue *q,
 /* TODO: Do we really never want to specify the origin here? */
 void markQueueAddRoot (MarkQueue* q, StgClosure** root)
 {
-    markQueuePushClosure(q, *root, NULL);
+    markQueuePushClosureGC(q, *root);
 }
 
 /* Push a closure to the mark queue without origin information */
@@ -908,18 +934,24 @@ void initMarkQueue (MarkQueue *queue)
 }
 
 /* Must hold sm_mutex. */
-void init_upd_rem_set (UpdRemSet *rset)
+void nonmovingInitUpdRemSet (UpdRemSet *rset)
 {
     init_mark_queue_(&rset->queue);
     rset->queue.is_upd_rem_set = true;
 }
 
-void reset_upd_rem_set (UpdRemSet *rset)
+static void nonmovingResetUpdRemSetQueue (MarkQueue *rset)
 {
     // UpdRemSets always have one block for the mark queue. This assertion is to
     // update this code if we change that.
-    ASSERT(rset->queue.blocks->link == NULL);
-    rset->queue.top->head = 0;
+    ASSERT(rset->is_upd_rem_set);
+    ASSERT(rset->blocks->link == NULL);
+    rset->top->head = 0;
+}
+
+void nonmovingResetUpdRemSet (UpdRemSet *rset)
+{
+    nonmovingResetUpdRemSetQueue(&rset->queue);
 }
 
 void freeMarkQueue (MarkQueue *queue)
@@ -1742,7 +1774,9 @@ nonmovingMark (MarkQueue *queue)
         }
         case NULL_ENTRY:
             // Perhaps the update remembered set has more to mark...
-            if (upd_rem_set_block_list) {
+            // N.B. This must be atomic since we have not yet taken
+            // upd_rem_set_lock.
+            if (RELAXED_LOAD(&upd_rem_set_block_list) != NULL) {
                 ACQUIRE_LOCK(&upd_rem_set_lock);
                 bdescr *old = queue->blocks;
                 queue->blocks = upd_rem_set_block_list;
@@ -1856,7 +1890,28 @@ static bool nonmovingIsNowAlive (StgClosure *p)
             || (bd->flags & BF_MARKED) != 0;
                    // The object was marked
     } else {
-        return nonmovingClosureMarkedThisCycle((P_)p);
+        struct NonmovingSegment *seg = nonmovingGetSegment((StgPtr) p);
+        StgClosure *snapshot_loc =
+          (StgClosure *) nonmovingSegmentGetBlock(seg, nonmovingSegmentInfo(seg)->next_free_snap);
+        if (p >= snapshot_loc && nonmovingGetClosureMark((StgPtr) p) == 0) {
+            /*
+             * In this case we are looking at a block that wasn't allocated
+             * at the time that the snapshot was taken. As we do not mark such
+             * blocks, we must assume that it is reachable.
+             */
+            return true;
+        } else {
+            return nonmovingClosureMarkedThisCycle((P_)p);
+        }
+    }
+}
+
+// Mark all Weak#s on nonmoving_old_weak_ptr_list.
+void nonmovingMarkWeakPtrList (struct MarkQueue_ *queue)
+{
+    ASSERT(nonmoving_weak_ptr_list == NULL);
+    for (StgWeak *w = nonmoving_old_weak_ptr_list; w != NULL; w = w->link) {
+        mark_closure(queue, (StgClosure *) w, NULL);
     }
 }
 
@@ -1868,6 +1923,9 @@ bool nonmovingTidyWeaks (struct MarkQueue_ *queue)
     StgWeak **last_w = &nonmoving_old_weak_ptr_list;
     StgWeak *next_w;
     for (StgWeak *w = nonmoving_old_weak_ptr_list; w != NULL; w = next_w) {
+        // This should have been marked by nonmovingMarkWeaks
+        ASSERT(nonmovingIsNowAlive((StgClosure *) w));
+
         if (w->header.info == &stg_DEAD_WEAK_info) {
             // finalizeWeak# was called on the weak
             next_w = w->link;
@@ -1878,7 +1936,10 @@ bool nonmovingTidyWeaks (struct MarkQueue_ *queue)
         // Otherwise it's a live weak
         ASSERT(w->header.info == &stg_WEAK_info);
 
-        if (nonmovingIsNowAlive(w->key)) {
+        // See Note [Weak pointer processing and the non-moving GC] in
+        // MarkWeak.c
+        bool key_in_nonmoving = Bdescr((StgPtr) w->key)->flags & BF_NONMOVING;
+        if (!key_in_nonmoving || nonmovingIsNowAlive(w->key)) {
             nonmovingMarkLiveWeak(queue, w);
             did_work = true;
 
@@ -1908,7 +1969,8 @@ void nonmovingMarkDeadWeak (struct MarkQueue_ *queue, StgWeak *w)
 
 void nonmovingMarkLiveWeak (struct MarkQueue_ *queue, StgWeak *w)
 {
-    ASSERT(nonmovingClosureMarkedThisCycle((P_)w));
+    ASSERT(nonmovingIsNowAlive((StgClosure *) w));
+    ASSERT(nonmovingIsNowAlive((StgClosure *) w->key));
     markQueuePushClosure_(queue, w->value);
     markQueuePushClosure_(queue, w->finalizer);
     markQueuePushClosure_(queue, w->cfinalizers);
@@ -1922,9 +1984,9 @@ void nonmovingMarkDeadWeaks (struct MarkQueue_ *queue, StgWeak **dead_weaks)
 {
     StgWeak *next_w;
     for (StgWeak *w = nonmoving_old_weak_ptr_list; w; w = next_w) {
-        ASSERT(!nonmovingClosureMarkedThisCycle((P_)(w->key)));
+        ASSERT(!nonmovingIsNowAlive(w->key));
         nonmovingMarkDeadWeak(queue, w);
-        next_w = w ->link;
+        next_w = w->link;
         w->link = *dead_weaks;
         *dead_weaks = w;
     }


=====================================
rts/sm/NonMovingMark.h
=====================================
@@ -137,8 +137,7 @@ extern bdescr *upd_rem_set_block_list;
 
 void nonmovingMarkInitUpdRemSet(void);
 
-void init_upd_rem_set(UpdRemSet *rset);
-void reset_upd_rem_set(UpdRemSet *rset);
+void nonmovingInitUpdRemSet(UpdRemSet *rset);
 void updateRemembSetPushClosure(Capability *cap, StgClosure *p);
 void updateRemembSetPushThunk(Capability *cap, StgThunk *p);
 void updateRemembSetPushTSO(Capability *cap, StgTSO *tso);
@@ -158,6 +157,7 @@ void initMarkQueue(MarkQueue *queue);
 void freeMarkQueue(MarkQueue *queue);
 void nonmovingMark(struct MarkQueue_ *restrict queue);
 
+void nonmovingMarkWeakPtrList(struct MarkQueue_ *queue);
 bool nonmovingTidyWeaks(struct MarkQueue_ *queue);
 void nonmovingTidyThreads(void);
 void nonmovingMarkDeadWeaks(struct MarkQueue_ *queue, StgWeak **dead_weak_ptr_list);
@@ -165,7 +165,7 @@ void nonmovingResurrectThreads(struct MarkQueue_ *queue, StgTSO **resurrected_th
 bool nonmovingIsAlive(StgClosure *p);
 void nonmovingMarkDeadWeak(struct MarkQueue_ *queue, StgWeak *w);
 void nonmovingMarkLiveWeak(struct MarkQueue_ *queue, StgWeak *w);
-void nonmovingAddUpdRemSetBlocks(struct MarkQueue_ *rset);
+void nonmovingAddUpdRemSetBlocks(UpdRemSet *rset);
 
 void markQueuePush(MarkQueue *q, const MarkQueueEnt *ent);
 void markQueuePushClosureGC(MarkQueue *q, StgClosure *p);


=====================================
rts/sm/NonMovingShortcut.c
=====================================
@@ -153,7 +153,8 @@ selectee_changed:
     // Selectee is a non-moving object, mark it.
     markQueuePushClosure(queue, selectee, NULL);
 
-    const StgInfoTable *selectee_info_tbl = get_itbl(selectee);
+    // This may synchronize with the release in updateWithIndirection.
+    const StgInfoTable *selectee_info_tbl = get_itbl_acquire(selectee);
     switch (selectee_info_tbl->type) {
         case WHITEHOLE: {
             // Probably a loop. Abort.


=====================================
rts/sm/NonMovingSweep.c
=====================================
@@ -31,6 +31,7 @@ enum SweepResult {
 GNUC_ATTR_HOT static enum SweepResult
 nonmovingSweepSegment(struct NonmovingSegment *seg)
 {
+    ASSERT_SEGMENT_STATE(seg, FILLED_SWEEPING);
     const nonmoving_block_idx blk_cnt = nonmovingSegmentBlockCount(seg);
     bool found_free = false;
     bool found_live = false;
@@ -39,20 +40,20 @@ nonmovingSweepSegment(struct NonmovingSegment *seg)
     {
         if (seg->bitmap[i] == nonmovingMarkEpoch) {
             found_live = true;
-        } else if (!found_free) {
-            // This is the first free block we've found; set next_free,
-            // next_free_snap, and the scan pointer.
-            found_free = true;
-            seg->next_free = i;
-            nonmovingSegmentInfo(seg)->next_free_snap = i;
-            Bdescr((P_)seg)->u.scan = (P_)nonmovingSegmentGetBlock(seg, i);
-            seg->bitmap[i] = 0;
         } else {
             seg->bitmap[i] = 0;
+            if (!found_free) {
+                // This is the first free block we've found; set next_free,
+                // next_free_snap, and the scan pointer.
+                found_free = true;
+                seg->next_free = i;
+                nonmovingSegmentInfo(seg)->next_free_snap = i;
+                Bdescr((P_)seg)->u.scan = (P_)nonmovingSegmentGetBlock(seg, i);
+            }
         }
 
         if (found_free && found_live) {
-            // zero the remaining dead object's mark bits
+            // zero the remaining dead objects mark bits
             for (; i < nonmovingSegmentBlockCount(seg); ++i) {
                 if (seg->bitmap[i] != nonmovingMarkEpoch) {
                     seg->bitmap[i] = 0;
@@ -118,7 +119,8 @@ clear_segment_free_blocks(struct NonmovingSegment* seg)
 {
     unsigned int block_size = nonmovingSegmentBlockSize(seg);
     for (unsigned int p_idx = 0; p_idx < nonmovingSegmentBlockCount(seg); ++p_idx) {
-        // after mark, so bit not set == dead
+        // N.B. nonmovingSweepSegment helpfully clears the bitmap entries of
+        // dead blocks
         if (nonmovingGetMark(seg, p_idx) == 0) {
             memset(nonmovingSegmentGetBlock(seg, p_idx), 0, block_size);
         }
@@ -286,6 +288,7 @@ void nonmovingSweepMutLists()
         for (bdescr *bd = old_mut_list; bd; bd = bd->link) {
             for (StgPtr p = bd->start; p < bd->free; p++) {
                 StgClosure **q = (StgClosure**)p;
+                ASSERT(Bdescr((StgPtr) *q)->gen == oldest_gen);
                 if (nonmovingIsAlive(*q) && !is_closure_clean(*q)) {
                     recordMutableCap(*q, cap, oldest_gen->no);
                 }


=====================================
rts/sm/Sanity.c
=====================================
@@ -44,6 +44,8 @@ static void  checkLargeBitmap    ( StgPtr payload, StgLargeBitmap*, uint32_t );
 static void  checkClosureShallow ( const StgClosure * );
 static void  checkSTACK          (StgStack *stack);
 
+static void  checkCompactObjects (bdescr *bd);
+
 static W_    countNonMovingSegments ( struct NonmovingSegment *segs );
 static W_    countNonMovingHeap     ( struct NonmovingHeap *heap );
 
@@ -631,6 +633,9 @@ static void checkNonmovingSegments (struct NonmovingSegment *seg)
 
 void checkNonmovingHeap (const struct NonmovingHeap *heap)
 {
+    checkLargeObjects(nonmoving_large_objects);
+    checkLargeObjects(nonmoving_marked_large_objects);
+    checkCompactObjects(nonmoving_compact_objects);
     for (unsigned int i=0; i < NONMOVING_ALLOCA_CNT; i++) {
         const struct NonmovingAllocator *alloc = heap->allocators[i];
         checkNonmovingSegments(alloc->filled);
@@ -1053,6 +1058,7 @@ findMemoryLeak (void)
     for (i = 0; i < getNumCapabilities(); i++) {
         markBlocks(gc_threads[i]->free_blocks);
         markBlocks(getCapability(i)->pinned_object_block);
+        markBlocks(getCapability(i)->pinned_object_blocks);
         markBlocks(getCapability(i)->upd_rem_set.queue.blocks);
     }
 


=====================================
rts/sm/Storage.c
=====================================
@@ -325,7 +325,7 @@ void storageAddCapabilities (uint32_t from, uint32_t to)
     if (RtsFlags.GcFlags.useNonmoving) {
         nonmovingAddCapabilities(to);
         for (i = 0; i < to; ++i) {
-            init_upd_rem_set(&getCapability(i)->upd_rem_set);
+            nonmovingInitUpdRemSet(&getCapability(i)->upd_rem_set);
         }
     }
 



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/738262df05b27e9a196b35a3690883d9c176bb3e...1fdffe157170bb42741c44bb03d44f61ed5f8822

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/738262df05b27e9a196b35a3690883d9c176bb3e...1fdffe157170bb42741c44bb03d44f61ed5f8822
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20221118/753471a7/attachment-0001.html>


More information about the ghc-commits mailing list