[Git][ghc/ghc][wip/weak-sanity-race] rts/Sanity: Avoid nasty race in weak pointer sanity-checking

Ben Gamari gitlab at gitlab.haskell.org
Wed Nov 4 18:22:33 UTC 2020



Ben Gamari pushed to branch wip/weak-sanity-race at Glasgow Haskell Compiler / GHC


Commits:
55fbeb76 by Ben Gamari at 2020-11-04T13:22:26-05:00
rts/Sanity: Avoid nasty race in weak pointer sanity-checking

See Note [Racing weak pointer evacuation] for all of the gory details.

- - - - -


1 changed file:

- rts/sm/Sanity.c


Changes:

=====================================
rts/sm/Sanity.c
=====================================
@@ -233,6 +233,110 @@ checkClosureProfSanity(const StgClosure *p)
 }
 #endif
 
+/* Note [Racing weak pointer evacuation]
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * While debugging a GC crash (#18919) I noticed a spurious crash due to the
+ * end-of-GC sanity check stumbling across a weak pointer with unevacuated key.
+ * This can happen when two GC threads race to evacuate a weak pointer.
+ * Specifically, we start out with a heap with a weak pointer reachable
+ * from both a generation's weak pointer list and some other root-reachable
+ * closure (e.g. a Just constructor):
+ *
+ *            O                      W
+ *            ┌──────────┐           ┌──────────┐
+ * Root ────→ │ Just     │     ╭───→ │ Weak#    │ ←─────── weak_ptr_list
+ * Set        ├──────────┤     │     ├──────────┤
+ *            │          │ ────╯     │ value    │ ─→ ...
+ *            └──────────┘           │ key      │ ───╮    K
+ *                                   │ ...      │    │    ┌──────────┐
+ *                                   └──────────┘    ╰──→ │ ...      │
+ *                                                        ├──────────┤
+ *
+ * The situation proceeds as follows:
+ *
+ * 1. Thread A initiates a GC, wakes up the GC worker threads, and starts
+ *    evacuating roots.
+ * 2. Thread A evacuates a weak pointer object O to location O'.
+ * 3. Thread A fills the block where O' lives and pushes it to its
+ *    work-stealing queue.
+ * 4. Thread B steals the O' block and starts scavenging it.
+ * 5. Thread A enters markWeakPtrList.
+ * 6. Thread A starts evacuating W, resulting in Wb'.
+ * 7. Thread B scavenges O', evacuating W', resulting in Wa'.
+ * 8. Thread A and B are now racing to evacuate W. Only one will win the race
+ *    (due to the CAS in copy_tag). Let the winning copy be called W'.
+ * 9. W will be replaced by a forwarding pointer to the winning copy, W'.
+ * 10. Whichever thread loses the race will retry evacuation, see
+ *     that W has already been evacuated, and proceed as usual.
+ * 10. W' will get added to weak_ptr_list by markWeakPtrList.
+ * 11. Eventually W' will be scavenged.
+ * 12. traverseWeakPtrList will see that W' has been scavenged and evacuate the
+ *     its key.
+ * 13. However, the copy that lost the race is not on `weak_ptr_list`
+ *     and will therefore never get its `key` field scavenged (since
+ *     `traverseWeakPtrList` will never see it).
+ *
+ * Now the heap looks like:
+ *
+ *            O'                     W (from-space)
+ *            ┌──────────┐           ┌──────────┐
+ * Root ────→ │ Just     │           │ Fwd-ptr  │ ───────────╮
+ * Set        ├──────────┤           ├──────────┤            │
+ *            │          │ ────╮     │ value    │ ─→ ...     │
+ *            └──────────┘     │     │ key      │ ────────────────────────╮
+ *                             │     │ ...      │            │            │
+ *                             │     └──────────┘            │            │
+ *                             │                             │            │
+ *                             │     Wa'                     │            │
+ *                             │     ┌──────────┐       ╭────╯            │
+ *                             ╰───→ │ Weak#    │ ←─────┤                 │
+ *                                   ├──────────┤       ╰─ weak_ptr_list  │
+ *                                   │ value    │ ─→ ...                  │
+ *                                   │ key      │ ───╮    K'              │
+ *                                   │ ...      │    │    ┌──────────┐    │
+ *                                   └──────────┘    ╰──→ │ ...      │    │
+ *                                                        ├──────────┤    │
+ *                                   Wb'                                  │
+ *                                   ┌──────────┐                         │
+ *                                   │ Weak#    │                         │
+ *                                   ├──────────┤                         │
+ *                                   │ value    │ ─→ ...                  │
+ *                                   │ key      │ ───╮    K (from-space)  │
+ *                                   │ ...      │    │    ┌──────────┐    │
+ *                                   └──────────┘    ╰──→ │ 0xaaaaa  │ ←──╯
+ *                                                        ├──────────┤
+ *
+ *
+ * Without sanity checking this is fine; we have introduced a spurious copy of
+ * W, Wb' into the heap but it is unreachable and therefore won't cause any
+ * trouble. However, with sanity checking we may encounter this spurious copy
+ * when walking the heap. Moreover, this copy was never added to weak_ptr_list,
+ * meaning that its key field will not get scavenged and will therefore point
+ * into from-space..
+ *
+ * To avoid this checkClosure skips over the key field when it sees a weak
+ * pointer. Note that all fields of Wb' *other* than the key field should be
+ * valid, so we don't skip the closure entirely.
+ *
+ * We then do additional checking of all closures on the weak_ptr_lists, where
+ * we *do* check `key`.
+ */
+
+// Check validity of objects on weak_ptr_list.
+// See Note [Racing weak pointer evacuation].
+static void
+checkGenWeakPtrList( uint32_t g )
+{
+  for (StgWeak *w = generations[g].weak_ptr_list; w != NULL; w = w->link) {
+    ASSERT(LOOKS_LIKE_CLOSURE_PTR(w));
+    ASSERT(get_itbl((StgClosure *) w) == &stg_WEAK_info);
+    ASSERT(LOOKS_LIKE_CLOSURE_PTR(w->key));
+    ASSERT(LOOKS_LIKE_CLOSURE_PTR(w->value));
+    ASSERT(LOOKS_LIKE_CLOSURE_PTR(w->finalizer));
+    ASSERT(LOOKS_LIKE_CLOSURE_PTR(w->cfinalizers));
+  }
+}
+
 // Returns closure size in words
 StgOffset
 checkClosure( const StgClosure* p )
@@ -352,9 +456,11 @@ checkClosure( const StgClosure* p )
        * representative of the actual layout.
        */
       { StgWeak *w = (StgWeak *)p;
-        ASSERT(LOOKS_LIKE_CLOSURE_PTR(w->key));
+        // N.B. Checking the key is not safe.
+        // See Note [Racing weak pointer evacuation] for why.
         ASSERT(LOOKS_LIKE_CLOSURE_PTR(w->value));
         ASSERT(LOOKS_LIKE_CLOSURE_PTR(w->finalizer));
+        ASSERT(LOOKS_LIKE_CLOSURE_PTR(w->cfinalizers));
         if (w->link) {
           ASSERT(LOOKS_LIKE_CLOSURE_PTR(w->link));
         }
@@ -852,6 +958,12 @@ static void checkGeneration (generation *gen,
         checkHeapChain(ws->scavd_list);
     }
 
+    // Check weak pointer lists
+    // See Note [Racing weak pointer evacuation].
+    for (uint32_t g = 0; g <= RtsFlags.GcFlags.generations; g++) {
+      checkGenWeakPtrList(g);
+    }
+
     checkLargeObjects(gen->large_objects);
     checkCompactObjects(gen->compact_objects);
 }



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/55fbeb76a4ca58373b6c021a25666d74d5aadcd9

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/55fbeb76a4ca58373b6c021a25666d74d5aadcd9
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20201104/9b977705/attachment-0001.html>


More information about the ghc-commits mailing list