[Git][ghc/ghc][wip/backports-9.8] 4 commits: Make decomposeRuleLhs a bit more clever

Zubin (@wz1000) gitlab at gitlab.haskell.org
Fri Feb 16 12:35:33 UTC 2024



Zubin pushed to branch wip/backports-9.8 at Glasgow Haskell Compiler / GHC


Commits:
11700f5d by Simon Peyton Jones at 2024-02-16T18:05:07+05:30
Make decomposeRuleLhs a bit more clever

This fixes #24370 by making decomposeRuleLhs undertand
dictionary /functions/ as well as plain /dictionaries/

(cherry picked from commit ca2e919ecca35db412e772d7eadd6a7c4fb20e4b)

- - - - -
93ee2bd2 by ARATA Mizuki at 2024-02-16T18:05:07+05:30
Support 128-bit SIMD on AArch64 via LLVM backend

(cherry picked from commit 015886ec78e598f850c4202efdee239bac63b8c7)

- - - - -
0e0a0a6e by ARATA Mizuki at 2024-02-16T18:05:07+05:30
x86: Don't require -mavx2 when using 256-bit floating-point SIMD primitives

Fixes #24222

(cherry picked from commit 7d9a2e44e8cce00e24671325aebe47d9e529aee5)

- - - - -
99999dc7 by Ben Gamari at 2024-02-16T18:05:07+05:30
Fix thunk update ordering

Previously we attempted to ensure soundness of concurrent thunk update
by synchronizing on the access of the thunk's info table pointer field.
This was believed to be sufficient since the indirectee (which may
expose a closure allocated by another core) would not be examined
until the info table pointer update is complete.

However, it turns out that this can result in data races in the presence
of multiple threads racing a update a single thunk. For instance,
consider this interleaving under the old scheme:

            Thread A                             Thread B
            ---------                            ---------
    t=0     Enter t
      1     Push update frame
      2     Begin evaluation

      4     Pause thread
      5     t.indirectee=tso
      6     Release t.info=BLACKHOLE

      7     ... (e.g. GC)

      8     Resume thread
      9     Finish evaluation
      10    Relaxed t.indirectee=x

      11                                         Load t.info
      12                                         Acquire fence
      13                                         Inspect t.indirectee

      14    Release t.info=BLACKHOLE

Here Thread A enters thunk `t` but is soon paused, resulting in `t`
being lazily blackholed at t=6. Then, at t=10 Thread A finishes
evaluation and updates `t.indirectee` with a relaxed store.

Meanwhile, Thread B enters the blackhole. Under the old scheme this
would introduce an acquire-fence but this would only synchronize with
Thread A at t=6. Consequently, the result of the evaluation, `x`, is not
visible to Thread B, introducing a data race.

We fix this by treating the `indirectee` field as we do all other
mutable fields. This means we must always access this field with
acquire-loads and release-stores.

See #23185.

(cherry picked from commit 9a52ae46a33b490161e1e3e1cc70caa46c60488a)
(cherry picked from commit 88afc6ea885d54523efbbb764f3435a147b799a5)

- - - - -


30 changed files:

- compiler/CodeGen.Platform.h
- compiler/GHC.hs
- compiler/GHC/Cmm/CallConv.hs
- compiler/GHC/Core/Predicate.hs
- compiler/GHC/HsToCore/Binds.hs
- compiler/GHC/StgToCmm/Bind.hs
- compiler/GHC/StgToCmm/Prim.hs
- rts/Apply.cmm
- rts/Compact.cmm
- rts/Heap.c
- rts/Interpreter.c
- rts/Messages.c
- rts/PrimOps.cmm
- rts/StableName.c
- rts/StgMiscClosures.cmm
- rts/ThreadPaused.c
- rts/Threads.c
- rts/Updates.cmm
- rts/Updates.h
- rts/include/Cmm.h
- rts/include/rts/TSANUtils.h
- rts/include/stg/MachRegs.h
- rts/include/stg/SMP.h
- rts/sm/NonMovingMark.c
- rts/sm/Storage.c
- testsuite/tests/codeGen/should_run/all.T
- + testsuite/tests/simplCore/should_compile/T24370.hs
- testsuite/tests/simplCore/should_compile/all.T
- testsuite/tests/unboxedsums/all.T
- utils/genapply/Main.hs


Changes:

=====================================
compiler/CodeGen.Platform.h
=====================================
@@ -203,6 +203,39 @@ import GHC.Platform.Reg
 # define d29 61
 # define d30 62
 # define d31 63
+
+# define q0 32
+# define q1 33
+# define q2 34
+# define q3 35
+# define q4 36
+# define q5 37
+# define q6 38
+# define q7 39
+# define q8 40
+# define q9 41
+# define q10 42
+# define q11 43
+# define q12 44
+# define q13 45
+# define q14 46
+# define q15 47
+# define q16 48
+# define q17 49
+# define q18 50
+# define q19 51
+# define q20 52
+# define q21 53
+# define q22 54
+# define q23 55
+# define q24 56
+# define q25 57
+# define q26 58
+# define q27 59
+# define q28 60
+# define q29 61
+# define q30 62
+# define q31 63
 #endif
 
 # if defined(MACHREGS_darwin)


=====================================
compiler/GHC.hs
=====================================
@@ -1518,9 +1518,7 @@ modInfoModBreaks :: ModuleInfo -> ModBreaks
 modInfoModBreaks = minf_modBreaks
 
 isDictonaryId :: Id -> Bool
-isDictonaryId id
-  = case tcSplitSigmaTy (idType id) of {
-      (_tvs, _theta, tau) -> isDictTy tau }
+isDictonaryId id = isDictTy (idType id)
 
 -- | Looks up a global name: that is, any top-level name in any
 -- visible module.  Unlike 'lookupName', lookupGlobalName does not use


=====================================
compiler/GHC/Cmm/CallConv.hs
=====================================
@@ -193,8 +193,10 @@ realLongRegs    platform = map LongReg    $ regList (pc_MAX_Real_Long_REG    (pl
 
 realXmmRegNos :: Platform -> [Int]
 realXmmRegNos platform
-    | isSse2Enabled platform = regList (pc_MAX_Real_XMM_REG (platformConstants platform))
-    | otherwise              = []
+    | isSse2Enabled platform || platformArch platform == ArchAArch64
+    = regList (pc_MAX_Real_XMM_REG (platformConstants platform))
+    | otherwise
+    = []
 
 regList :: Int -> [Int]
 regList n = [1 .. n]


=====================================
compiler/GHC/Core/Predicate.hs
=====================================
@@ -99,7 +99,14 @@ mkClassPred :: Class -> [Type] -> PredType
 mkClassPred clas tys = mkTyConApp (classTyCon clas) tys
 
 isDictTy :: Type -> Bool
-isDictTy = isClassPred
+-- True of dictionaries (Eq a) and
+--         dictionary functions (forall a. Eq a => Eq [a])
+-- See Note [Type determines value]
+-- See #24370 (and the isDictId call in GHC.HsToCore.Binds.decomposeRuleLhs)
+--     for why it's important to catch dictionary bindings
+isDictTy ty = isClassPred pred
+  where
+    (_, pred) = splitInvisPiTys ty
 
 typeDeterminesValue :: Type -> Bool
 -- See Note [Type determines value]


=====================================
compiler/GHC/HsToCore/Binds.hs
=====================================
@@ -849,7 +849,16 @@ decomposeRuleLhs dflags orig_bndrs orig_lhs rhs_fvs
   = Left (DsRuleIgnoredDueToConstructor con) -- See Note [No RULES on datacons]
 
   | otherwise = case decompose fun2 args2 of
-        Nothing -> Left (DsRuleLhsTooComplicated orig_lhs lhs2)
+        Nothing -> -- pprTrace "decomposeRuleLhs 3" (vcat [ text "orig_bndrs:" <+> ppr orig_bndrs
+                   --                                    , text "orig_lhs:" <+> ppr orig_lhs
+                   --                                    , text "rhs_fvs:" <+> ppr rhs_fvs
+                   --                                    , text "orig_lhs:" <+> ppr orig_lhs
+                   --                                    , text "lhs1:" <+> ppr lhs1
+                   --                                    , text "lhs2:" <+> ppr lhs2
+                   --                                    , text "fun2:" <+> ppr fun2
+                   --                                    , text "args2:" <+> ppr args2
+                   --                                    ]) $
+                   Left (DsRuleLhsTooComplicated orig_lhs lhs2)
         Just (fn_id, args)
           | not (null unbound) ->
             -- Check for things unbound on LHS
@@ -921,7 +930,9 @@ decomposeRuleLhs dflags orig_bndrs orig_lhs rhs_fvs
 
    split_lets :: CoreExpr -> ([(DictId,CoreExpr)], CoreExpr)
    split_lets (Let (NonRec d r) body)
-     | isDictId d
+     | isDictId d  -- Catches dictionaries, yes, but also catches dictionary
+                   -- /functions/ arising from solving a
+                   -- quantified contraint (#24370)
      = ((d,r):bs, body')
      where (bs, body') = split_lets body
 


=====================================
compiler/GHC/StgToCmm/Bind.hs
=====================================
@@ -708,11 +708,19 @@ emitBlackHoleCode node = do
 
   when eager_blackholing $ do
     whenUpdRemSetEnabled $ emitUpdRemSetPushThunk node
-    emitStore (cmmOffsetW platform node (fixedHdrSizeW profile)) (currentTSOExpr platform)
+    emitAtomicStore platform MemOrderRelease
+        (cmmOffsetW platform node (fixedHdrSizeW profile))
+        (currentTSOExpr platform)
     -- See Note [Heap memory barriers] in SMP.h.
-    let w = wordWidth platform
-    emitPrimCall [] (MO_AtomicWrite w MemOrderRelease)
-        [node, CmmReg (CmmGlobal $ GlobalRegUse EagerBlackholeInfo $ bWord platform)]
+    emitAtomicStore platform MemOrderRelease
+        node
+        (CmmReg (CmmGlobal $ GlobalRegUse EagerBlackholeInfo $ bWord platform))
+
+emitAtomicStore :: Platform -> MemoryOrdering -> CmmExpr -> CmmExpr -> FCode ()
+emitAtomicStore platform mord addr val =
+    emitPrimCall [] (MO_AtomicWrite w mord) [addr, val]
+  where
+    w = typeWidth $ cmmExprType platform val
 
 setupUpdate :: ClosureInfo -> LocalReg -> FCode () -> FCode ()
         -- Nota Bene: this function does not change Node (even if it's a CAF),


=====================================
compiler/GHC/StgToCmm/Prim.hs
=====================================
@@ -2299,7 +2299,7 @@ vecCmmCat FloatVec = cmmFloat
 -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 -- Check to make sure that we can generate code for the specified vector type
 -- given the current set of dynamic flags.
--- Currently these checks are specific to x86 and x86_64 architecture.
+-- Currently these checks are specific to x86, x86_64 and AArch64 architectures.
 -- This should be fixed!
 -- In particular,
 -- 1) Add better support for other architectures! (this may require a redesign)
@@ -2330,27 +2330,40 @@ vecCmmCat FloatVec = cmmFloat
 checkVecCompatibility :: StgToCmmConfig -> PrimOpVecCat -> Length -> Width -> FCode ()
 checkVecCompatibility cfg vcat l w =
   case stgToCmmVecInstrsErr cfg of
-    Nothing  -> check vecWidth vcat l w  -- We are in a compatible backend
-    Just err -> sorry err                -- incompatible backend, do panic
+    Nothing | isX86 -> checkX86 vecWidth vcat l w
+            | platformArch platform == ArchAArch64 -> checkAArch64 vecWidth
+            | otherwise -> sorry "SIMD vector instructions are not supported on this architecture."
+    Just err -> sorry err  -- incompatible backend, do panic
   where
     platform = stgToCmmPlatform cfg
-    check :: Width -> PrimOpVecCat -> Length -> Width -> FCode ()
-    check W128 FloatVec 4 W32 | not (isSseEnabled platform) =
+    isX86 = case platformArch platform of
+      ArchX86_64 -> True
+      ArchX86 -> True
+      _ -> False
+    checkX86 :: Width -> PrimOpVecCat -> Length -> Width -> FCode ()
+    checkX86 W128 FloatVec 4 W32 | isSseEnabled platform = return ()
+                                 | otherwise =
         sorry $ "128-bit wide single-precision floating point " ++
                 "SIMD vector instructions require at least -msse."
-    check W128 _ _ _ | not (isSse2Enabled platform) =
+    checkX86 W128 _ _ _ | not (isSse2Enabled platform) =
         sorry $ "128-bit wide integer and double precision " ++
                 "SIMD vector instructions require at least -msse2."
-    check W256 FloatVec _ _ | not (stgToCmmAvx cfg) =
+    checkX86 W256 FloatVec _ _ | stgToCmmAvx cfg = return ()
+                               | otherwise =
         sorry $ "256-bit wide floating point " ++
                 "SIMD vector instructions require at least -mavx."
-    check W256 _ _ _ | not (stgToCmmAvx2 cfg) =
+    checkX86 W256 _ _ _ | not (stgToCmmAvx2 cfg) =
         sorry $ "256-bit wide integer " ++
                 "SIMD vector instructions require at least -mavx2."
-    check W512 _ _ _ | not (stgToCmmAvx512f cfg) =
+    checkX86 W512 _ _ _ | not (stgToCmmAvx512f cfg) =
         sorry $ "512-bit wide " ++
                 "SIMD vector instructions require -mavx512f."
-    check _ _ _ _ = return ()
+    checkX86 _ _ _ _ = return ()
+
+    checkAArch64 :: Width -> FCode ()
+    checkAArch64 W256 = sorry $ "256-bit wide SIMD vector instructions are not supported."
+    checkAArch64 W512 = sorry $ "512-bit wide SIMD vector instructions are not supported."
+    checkAArch64 _ = return ()
 
     vecWidth = typeWidth (vecVmmType vcat l w)
 


=====================================
rts/Apply.cmm
=====================================
@@ -108,7 +108,7 @@ again:
             IND,
             IND_STATIC:
         {
-            fun = StgInd_indirectee(fun);
+            fun = %acquire StgInd_indirectee(fun);
             goto again;
         }
         case BCO:
@@ -693,7 +693,7 @@ INFO_TABLE(stg_AP_STACK,/*special layout*/0,0,AP_STACK,"AP_STACK","AP_STACK")
   }
   // Can't add StgInd_indirectee(ap) to UpdRemSet here because the old value is
   // not reachable.
-  StgInd_indirectee(ap) = CurrentTSO;
+  %release StgInd_indirectee(ap) = CurrentTSO;
   SET_INFO_RELEASE(ap, __stg_EAGER_BLACKHOLE_info);
 
   /* ensure there is at least AP_STACK_SPLIM words of headroom available


=====================================
rts/Compact.cmm
=====================================
@@ -100,7 +100,7 @@ eval:
 
     // Follow indirections:
     case IND, IND_STATIC: {
-        p = StgInd_indirectee(p);
+        p = %acquire StgInd_indirectee(p);
         goto eval;
     }
 


=====================================
rts/Heap.c
=====================================
@@ -173,7 +173,7 @@ StgWord collect_pointers(StgClosure *closure, StgClosure *ptrs[]) {
         case IND:
         case IND_STATIC:
         case BLACKHOLE:
-            ptrs[nptrs++] = (StgClosure *)(((StgInd *)closure)->indirectee);
+            ptrs[nptrs++] = (StgClosure *) ACQUIRE_LOAD(&((StgInd *)closure)->indirectee);
             break;
 
         case MUT_ARR_PTRS_CLEAN:


=====================================
rts/Interpreter.c
=====================================
@@ -420,7 +420,7 @@ eval_obj:
     case IND:
     case IND_STATIC:
     {
-        tagged_obj = ((StgInd*)obj)->indirectee;
+        tagged_obj = ACQUIRE_LOAD(&((StgInd*)obj)->indirectee);
         goto eval_obj;
     }
 


=====================================
rts/Messages.c
=====================================
@@ -191,9 +191,6 @@ uint32_t messageBlackHole(Capability *cap, MessageBlackHole *msg)
     StgClosure *p;
     const StgInfoTable *info;
     do {
-        // If we are being called from stg_BLACKHOLE then TSAN won't know about the
-        // previous read barrier that makes the following access safe.
-        TSAN_ANNOTATE_BENIGN_RACE(&((StgInd*)bh)->indirectee, "messageBlackHole");
         p = UNTAG_CLOSURE(ACQUIRE_LOAD(&((StgInd*)bh)->indirectee));
         info = RELAXED_LOAD(&p->header.info);
     } while (info == &stg_IND_info);
@@ -291,7 +288,7 @@ uint32_t messageBlackHole(Capability *cap, MessageBlackHole *msg)
             // makes it into the update remembered set
             updateRemembSetPushClosure(cap, (StgClosure*)bq->queue);
         }
-        RELAXED_STORE(&msg->link, bq->queue);
+        msg->link = bq->queue;
         bq->queue = msg;
         // No barrier is necessary here: we are only exposing the
         // closure to the GC. See Note [Heap memory barriers] in SMP.h.


=====================================
rts/PrimOps.cmm
=====================================
@@ -1767,7 +1767,7 @@ loop:
     qinfo = GET_INFO_ACQUIRE(q);
     if (qinfo == stg_IND_info ||
         qinfo == stg_MSG_NULL_info) {
-        q = StgInd_indirectee(q);
+        q = %acquire StgInd_indirectee(q);
         goto loop;
     }
 
@@ -1835,7 +1835,7 @@ loop:
 
     if (qinfo == stg_IND_info ||
         qinfo == stg_MSG_NULL_info) {
-        q = StgInd_indirectee(q);
+        q = %acquire StgInd_indirectee(q);
         goto loop;
     }
 
@@ -1937,7 +1937,7 @@ loop:
 
     if (qinfo == stg_IND_info ||
         qinfo == stg_MSG_NULL_info) {
-        q = StgInd_indirectee(q);
+        q = %acquire StgInd_indirectee(q);
         goto loop;
     }
 
@@ -2026,7 +2026,7 @@ loop:
 
     if (qinfo == stg_IND_info ||
         qinfo == stg_MSG_NULL_info) {
-        q = StgInd_indirectee(q);
+        q = %acquire StgInd_indirectee(q);
         goto loop;
     }
 
@@ -2307,7 +2307,7 @@ loop:
     //Possibly IND added by removeFromMVarBlockedQueue
     if (StgHeader_info(q) == stg_IND_info ||
         StgHeader_info(q) == stg_MSG_NULL_info) {
-        q = StgInd_indirectee(q);
+        q = %acquire StgInd_indirectee(q);
         goto loop;
     }
 


=====================================
rts/StableName.c
=====================================
@@ -156,11 +156,11 @@ removeIndirections (StgClosure* p)
         switch (get_itbl(q)->type) {
         case IND:
         case IND_STATIC:
-            p = ((StgInd *)q)->indirectee;
+            p = ACQUIRE_LOAD(&((StgInd *)q)->indirectee);
             continue;
 
         case BLACKHOLE:
-            p = ((StgInd *)q)->indirectee;
+            p = ACQUIRE_LOAD(&((StgInd *)q)->indirectee);
             if (GET_CLOSURE_TAG(p) != 0) {
                 continue;
             } else {


=====================================
rts/StgMiscClosures.cmm
=====================================
@@ -510,7 +510,8 @@ INFO_TABLE(stg_IND,1,0,IND,"IND","IND")
 {
     TICK_ENT_DYN_IND(); /* tick */
     ACQUIRE_FENCE;
-    node = UNTAG(StgInd_indirectee(node));
+    node = %acquire StgInd_indirectee(node);
+    node = UNTAG(node);
     TICK_ENT_VIA_NODE();
     jump %GET_ENTRY(node) (node);
 }
@@ -519,7 +520,9 @@ INFO_TABLE(stg_IND,1,0,IND,"IND","IND")
 {
     TICK_ENT_DYN_IND(); /* tick */
     ACQUIRE_FENCE;
-    R1 = UNTAG(StgInd_indirectee(R1));
+    P_ p;
+    p = %acquire StgInd_indirectee(R1);
+    R1 = UNTAG(p);
     TICK_ENT_VIA_NODE();
     jump %GET_ENTRY(R1) [R1];
 }
@@ -530,7 +533,9 @@ INFO_TABLE(stg_IND_STATIC,1,0,IND_STATIC,"IND_STATIC","IND_STATIC")
 {
     TICK_ENT_STATIC_IND(); /* tick */
     ACQUIRE_FENCE;
-    R1 = UNTAG(StgInd_indirectee(R1));
+    P_ p;
+    p = %acquire StgInd_indirectee(R1);
+    R1 = UNTAG(p);
     TICK_ENT_VIA_NODE();
     jump %GET_ENTRY(R1) [R1];
 }
@@ -553,14 +558,11 @@ INFO_TABLE(stg_BLACKHOLE,1,0,BLACKHOLE,"BLACKHOLE","BLACKHOLE")
     TICK_ENT_DYN_IND(); /* tick */
 
 retry:
-#if defined(TSAN_ENABLED)
-    // See Note [ThreadSanitizer and fences]
-    W_ unused; unused = %acquire GET_INFO(node);
-#endif
-    // Synchronizes with the release-store in updateWithIndirection.
+    // Synchronizes with the release-store in
+    // updateWithIndirection.
     // See Note [Heap memory barriers] in SMP.h.
     ACQUIRE_FENCE;
-    p = %relaxed StgInd_indirectee(node);
+    p = %acquire StgInd_indirectee(node);
     if (GETTAG(p) != 0) {
         return (p);
     }
@@ -645,7 +647,7 @@ INFO_TABLE(stg_WHITEHOLE, 0,0, WHITEHOLE, "WHITEHOLE", "WHITEHOLE")
     i = 0;
 loop:
     // spin until the WHITEHOLE is updated
-    info = StgHeader_info(node);
+    info = %relaxed StgHeader_info(node);
     if (info == stg_WHITEHOLE_info) {
 #if defined(PROF_SPIN)
         W_[whitehole_lockClosure_spin] =
@@ -664,6 +666,7 @@ loop:
         // defined in CMM.
         goto loop;
     }
+    ACQUIRE_FENCE;
     jump %ENTRY_CODE(info) (node);
 #else
     ccall barf("WHITEHOLE object (%p) entered!", R1) never returns;


=====================================
rts/ThreadPaused.c
=====================================
@@ -352,7 +352,7 @@ threadPaused(Capability *cap, StgTSO *tso)
             OVERWRITING_CLOSURE_SIZE(bh, closure_sizeW_(bh, INFO_PTR_TO_STRUCT(bh_info)));
 
             // The payload of the BLACKHOLE points to the TSO
-            ((StgInd *)bh)->indirectee = (StgClosure *)tso;
+            RELEASE_STORE(&((StgInd *)bh)->indirectee, (StgClosure *)tso);
             SET_INFO_RELEASE(bh,&stg_BLACKHOLE_info);
 
             // .. and we need a write barrier, since we just mutated the closure:


=====================================
rts/Threads.c
=====================================
@@ -437,7 +437,7 @@ checkBlockingQueues (Capability *cap, StgTSO *tso)
         p = UNTAG_CLOSURE(bq->bh);
         const StgInfoTable *pinfo = ACQUIRE_LOAD(&p->header.info);
         if (pinfo != &stg_BLACKHOLE_info ||
-            ((StgInd *)p)->indirectee != (StgClosure*)bq)
+            (RELAXED_LOAD(&((StgInd *)p)->indirectee) != (StgClosure*)bq))
         {
             wakeBlockingQueue(cap,bq);
         }
@@ -468,7 +468,7 @@ updateThunk (Capability *cap, StgTSO *tso, StgClosure *thunk, StgClosure *val)
         return;
     }
 
-    v = UNTAG_CLOSURE(((StgInd*)thunk)->indirectee);
+    v = UNTAG_CLOSURE(ACQUIRE_LOAD(&((StgInd*)thunk)->indirectee));
 
     updateWithIndirection(cap, thunk, val);
 
@@ -808,7 +808,7 @@ loop:
     qinfo = ACQUIRE_LOAD(&q->header.info);
     if (qinfo == &stg_IND_info ||
         qinfo == &stg_MSG_NULL_info) {
-        q = (StgMVarTSOQueue*)((StgInd*)q)->indirectee;
+        q = (StgMVarTSOQueue*) ACQUIRE_LOAD(&((StgInd*)q)->indirectee);
         goto loop;
     }
 


=====================================
rts/Updates.cmm
=====================================
@@ -59,7 +59,7 @@ INFO_TABLE_RET ( stg_marked_upd_frame, UPDATE_FRAME,
     ASSERT(HpAlloc == 0); // Note [HpAlloc]
 
     // we know the closure is a BLACKHOLE
-    v = StgInd_indirectee(updatee);
+    v = %acquire StgInd_indirectee(updatee);
 
     if (GETTAG(v) != 0) (likely: False) {
         // updated by someone else: discard our value and use the


=====================================
rts/Updates.h
=====================================
@@ -59,8 +59,8 @@
     }                                                           \
                                                                 \
     OVERWRITING_CLOSURE(p1);                                    \
-    %relaxed StgInd_indirectee(p1) = p2;                        \
-    SET_INFO_RELEASE(p1, stg_BLACKHOLE_info);                   \
+    %release StgInd_indirectee(p1) = p2;                        \
+    %release SET_INFO(p1, stg_BLACKHOLE_info);                  \
     LDV_RECORD_CREATE(p1);                                      \
     and_then;
 


=====================================
rts/include/Cmm.h
=====================================
@@ -35,6 +35,7 @@
 #define CMINUSMINUS 1
 
 #include "ghcconfig.h"
+#include "rts/TSANUtils.h"
 
 /* -----------------------------------------------------------------------------
    Types
@@ -311,7 +312,7 @@
 #define ENTER(x) ENTER_(return,x)
 #endif
 
-#define ENTER_R1() ENTER_(RET_R1,R1)
+#define ENTER_R1() P_ _r1; _r1 = R1; ENTER_(RET_R1, _r1)
 
 #define RET_R1(x) jump %ENTRY_CODE(Sp(0)) [R1]
 
@@ -326,7 +327,7 @@
     IND,                                                \
     IND_STATIC:                                         \
    {                                                    \
-      x = StgInd_indirectee(x);                         \
+      x = %acquire StgInd_indirectee(x);                \
       goto again;                                       \
    }                                                    \
   case                                                  \
@@ -446,9 +447,17 @@
    HP_CHK_P(bytes);                             \
    TICK_ALLOC_RTS(bytes);
 
+// Load a field out of structure with relaxed ordering.
+#define RELAXED_LOAD_FIELD(fld, ptr) \
+    REP_##fld[(ptr) + OFFSET_##fld]
+
+// Load a field out of an StgClosure with relaxed ordering.
+#define RELAXED_LOAD_CLOSURE_FIELD(fld, ptr) \
+    REP_##fld[(ptr) + SIZEOF_StgHeader + OFFSET_##fld]
+
 #define CHECK_GC()                                                      \
   (bdescr_link(CurrentNursery) == NULL ||                               \
-   generation_n_new_large_words(W_[g0]) >= TO_W_(CLong[large_alloc_lim]))
+   RELAXED_LOAD_FIELD(generation_n_new_large_words, W_[g0]) >= TO_W_(CLong[large_alloc_lim]))
 
 // allocate() allocates from the nursery, so we check to see
 // whether the nursery is nearly empty in any function that uses


=====================================
rts/include/rts/TSANUtils.h
=====================================
@@ -39,6 +39,7 @@
 #endif
 #endif
 
+#if !defined(CMINUSMINUS)
 #if defined(TSAN_ENABLED)
 #if !defined(HAVE_C11_ATOMICS)
 #error TSAN cannot be enabled without C11 atomics support.
@@ -72,3 +73,4 @@ uint32_t ghc_tsan_atomic32_compare_exchange(uint32_t *ptr, uint32_t expected, ui
 uint16_t ghc_tsan_atomic16_compare_exchange(uint16_t *ptr, uint16_t expected, uint16_t new_value, int success_memorder, int failure_memorder);
 uint8_t ghc_tsan_atomic8_compare_exchange(uint8_t *ptr, uint8_t expected, uint8_t new_value, int success_memorder, int failure_memorder);
 
+#endif


=====================================
rts/include/stg/MachRegs.h
=====================================
@@ -457,6 +457,12 @@ the stack. See Note [Overlapping global registers] for implications.
 #define REG_D3          d14
 #define REG_D4          d15
 
+#define REG_XMM1        q4
+#define REG_XMM2        q5
+
+#define CALLER_SAVES_XMM1
+#define CALLER_SAVES_XMM2
+
 /* -----------------------------------------------------------------------------
    The s390x register mapping
 


=====================================
rts/include/stg/SMP.h
=====================================
@@ -132,31 +132,40 @@ EXTERN_INLINE void load_load_barrier(void);
  * stores which formed the new object are visible (e.g. stores are flushed from
  * cache and the relevant cachelines invalidated in other cores).
  *
- * To ensure this we must use memory barriers. Which barriers are required to
- * access a field depends upon the type of the field. In general, fields come
- * in three flavours:
- *
- *  * Mutable GC Pointers (C type StgClosure*, Cmm type StgPtr)
- *  * Immutable GC Pointers (C type MUT_FIELD StgClosure*, Cmm type StgPtr)
- *  * Non-pointers (C type StgWord, Cmm type StdWord)
+ * To ensure this we must issue memory barriers when accessing closures and
+ * their fields. Since reasoning about concurrent memory access with barriers tends to be
+ * subtle and platform dependent, it is more common to instead write programs
+ * in terms of an abstract memory model and let the compiler (GHC and the
+ * system's C compiler) worry about what barriers are needed to realize the
+ * requested semantics on the target system. GHC relies on the widely used C11
+ * memory model for this.
+ *
+ * Also note that the majority of this Note are only concerned with mutation
+ * by the mutator. The GC is free to change nearly any field (which is
+ * necessary for a moving GC). Naturally, doing this safely requires care which
+ * we discuss in the "Barriers during GC" section below.
+ *
+ * Field access
+ * ------------
+ * Which barriers are required to access a field of a closure depends upon the
+ * identity of the field. In general, fields come in three flavours:
+ *
+ *  * Mutable GC Pointers (C type `StgClosure*`, Cmm type `StgPtr`)
+ *  * Immutable GC Pointers (C type `MUT_FIELD StgClosure*`, Cmm type `StgPtr`)
+ *  * Non-pointers (C type `StgWord`, Cmm type `StgWord`)
  *
  * Note that Addr# fields are *not* GC pointers and therefore are classified
- * as non-pointers. Responsibility for barriers lies with the party
- * dereferencing the pointer.
- *
- * Also note that we are only concerned with mutation by the mutator. The GC
- * is free to change nearly any field as this is necessary for a moving GC.
- * Naturally, doing this safely requires care which we discuss in section
- * below.
+ * as non-pointers. In this case responsibility for barriers lies with the
+ * party dereferencing the Addr#.
  *
  * Immutable pointer fields are those which the mutator cannot change after
  * an object is made visible on the heap. Most objects' fields are of this
  * flavour (e.g. all data constructor fields). As these fields are written
  * precisely once, no write barriers are needed on writes nor reads. This is
  * safe due to an argument hinging on causality: Consider an immutable field F
- * of an object O refers to object O'. Naturally, O' must have been visible to
- * the creator of O when O was constructed. Consequently, if O is visible to a
- * reader, O' must also be visible.
+ * of an object O which refers to object O'. Naturally, O' must have been
+ * visible to the creator of O when O was constructed. Consequently, if O is
+ * visible to a reader, O' must also be visible to the same reader.
  *
  * Mutable pointer fields are those which can be modified by the mutator. These
  * require a bit more care as they may break the causality argument given
@@ -165,6 +174,10 @@ EXTERN_INLINE void load_load_barrier(void);
  * into F. Without explicit synchronization O' may not be visible to another
  * thread attempting to dereference F.
  *
+ * To ensure the visibility of the referent, writing to a mutable pointer field
+ * must be done via a release-store. Conversely, reading from such a field is
+ * done via an acquire-load.
+ *
  * Mutable fields include:
  *
  *   - StgMutVar: var
@@ -177,64 +190,102 @@ EXTERN_INLINE void load_load_barrier(void);
  *   - StgMutArrPtrs: payload
  *   - StgSmallMutArrPtrs: payload
  *   - StgThunk although this is a somewhat special case; see below
- *
- * Writing to a mutable pointer field must be done via a release-store.
- * Reading from such a field is done via an acquire-load.
+ *   - StgInd: indirectee
  *
  * Finally, non-pointer fields can be safely mutated without barriers as
- * they do not refer to other memory. Technically, concurrent accesses to
- * non-pointer fields still do need to be atomic in many cases to avoid torn
- * accesses. However, this is something that we generally avoid by locking
- * closures prior to mutating non-pointer fields (see Locking closures below).
- *
- * Note that MUT_VARs offer both synchronized and unsynchronized primops.
- * Consequently, in these cases there is a burden on the user to ensure that
- * synchronization is provided where necessary.
+ * they do not refer to other memory locations. Technically, concurrent
+ * accesses to non-pointer fields still do need to be atomic in many cases to
+ * avoid torn accesses. However, this is something that we generally avoid by
+ * locking closures prior to mutating non-pointer fields (see Locking closures
+ * below).
  *
  * Locking closures
  * ----------------
  * Several primops temporarily turn closures into WHITEHOLEs to ensure that
  * they have exclusive access (see SMPClosureOps.h:reallyLockClosure).
+ * These include,
+ *
+ *   - takeMVar#, tryTakeMVar#
+ *   - putMVar#, tryPutMVar#
+ *   - readMVar#, tryReadMVar#
+ *   - readIOPort#
+ *   - writeIOPort#
+ *   - addCFinalizerToWeak#
+ *   - finalizeWeak#
+ *   - deRefWeak#
+ *
  * Locking is done via an atomic exchange operation on the closure's info table
  * pointer with sequential consistency (although only acquire ordering is
- * needed). This acquire ensures that we synchronize with any previous thread
- * that had locked the closure. Consequently, it is important that we take great
- * care in examining the mutable fields of a lockable closure prior to having
- * locked it.
- *
- * Naturally, unlocking is done via a release-store to restore the closure's
- * original info table pointer.
+ * needed). Similarly, unlocking is also done with an atomic exchange to
+ * restore the closure's original info table pointer (although
+ * this time only the release ordering is needed). This ensures
+ * that we synchronize with any previous thread that had locked the closure.
  *
  * Thunks
  * ------
  * As noted above, thunks are a rather special (yet quite common) case. In
- * particular, they have the unique property of being updatable, transforming
- * from a thunk to an indirection. This transformation requires its own
- * synchronization protocol. In particular, we must ensure that a reader
- * examining a thunk being updated can see the indirectee. Consequently, a
- * thunk update (see rts/Updates.h) does the following:
+ * particular, they have the unique property of being updatable (that is, can
+ * be transformed from a thunk into an indirection after evaluation). This
+ * transformation requires its own synchronization protocol to mediate the
+ * interaction between the updater and the reader. In particular, we
+ * must ensure that a reader examining a thunk being updated by another core
+ * can see the indirectee. Consequently, a thunk update (see rts/Updates.h)
+ * does the following:
+ *
+ *  U1. use a release-store to place the new indirectee into the thunk's
+ *      indirectee field
  *
- *  1. Use a relaxed-store to place the new indirectee into the thunk's
- *     indirectee field
- *  2. use a release-store to set the info table to stg_BLACKHOLE (which
- *     represents an indirection)
+ *  U2. use a release-store to set the info table to stg_BLACKHOLE (which
+ *      represents an indirection)
  *
  * Blackholing a thunk (either eagerly, by GHC.StgToCmm.Bind.emitBlackHoleCode,
  * or lazily, by ThreadPaused.c:threadPaused) is done similarly.
  *
- * Conversely, indirection entry (see the entry code of stg_BLACKHOLE, stg_IND,
- * and stg_IND_STATIC in rts/StgMiscClosure.cmm) does the following:
- *
- *  1. We jump into the entry code for, e.g., stg_BLACKHOLE; this of course
- *     implies that we have already read the thunk's info table pointer, which
- *     is done with a relaxed load.
- *  2. use an acquire-fence to ensure that our view on the thunk is
- *     up-to-date. This synchronizes with step (2) in the update
- *     procedure.
- *  3. relaxed-load the indirectee. Since thunks are updated at most
- *     once we know that the fence in the last step has given us
- *     an up-to-date view of the indirectee closure.
- *  4. enter the indirectee (or block if the indirectee is a TSO)
+ * Conversely, entering an indirection (see the entry code of stg_BLACKHOLE,
+ * stg_IND, and stg_IND_STATIC in rts/StgMiscClosure.cmm) does the
+ * following:
+ *
+ *  E1. jump into the entry code of the indirection (e.g. stg_BLACKHOLE);
+ *      this of course implies that we have already read the thunk's info table
+ *      pointer, which is done with a relaxed load.
+ *
+ *  E2. acquire-fence
+ *
+ *  E3. acquire-load the indirectee. Since thunks are updated at most
+ *      once we know that the fence in the last step has given us
+ *      an up-to-date view of the indirectee closure.
+ *
+ *  E4. enter the indirectee (or block if the indirectee is a TSO)
+ *
+ * The release/acquire pair (U2)/(E2) is somewhat surprising but is necessary as
+ * the C11 memory model does not guarantee that the store (U1) is visible to
+ * (E3) despite (U1) preceding (U2) in program-order (due to the relaxed
+ * ordering of (E3)). This is demonstrated by the following CppMem model:
+ *
+ *     int main() {
+ *       atomic_int x = 0;    // info table pointer
+ *       atomic_int y = 0;    // indirectee
+ *       {{{
+ *         {    // blackhole update
+ *           y.store(1, memory_order_release);                // U1
+ *           x.store(2, memory_order_release);                // U2
+ *         }
+ *       |||
+ *         {    // blackhole entry
+ *           r1=x.load(memory_order_relaxed).readsvalue(2);   // E1
+ *           //fence(memory_order_acquire);                   // E2
+ *           r2=y.load(memory_order_acquire);                 // E3
+ *         }
+ *       }}};
+ *       return 0;
+ *     }
+ *
+ * Under the C11 memory model this program admits an execution where the
+ * indirectee `r2=0`.
+ *
+ * Of course, this could also be addressed by strengthing the ordering of (E1)
+ * to acquire, but this would incur a significant cost on every closure entry
+ * (including non-blackholes).
  *
  * Other closures
  * --------------
@@ -342,6 +393,12 @@ EXTERN_INLINE void load_load_barrier(void);
  * The work-stealing queue (WSDeque) also requires barriers; these are
  * documented in WSDeque.c.
  *
+ * Verifying memory ordering
+ * -------------------------
+ * To verify that GHC's RTS and the code produced by the compiler are free of
+ * data races we employ ThreadSaniziter. See Note [ThreadSanitizer] in TSANUtils.h
+ * for details on this facility.
+ *
  */
 
 /* ----------------------------------------------------------------------------


=====================================
rts/sm/NonMovingMark.c
=====================================
@@ -688,8 +688,9 @@ void updateRemembSetPushThunkEager(Capability *cap,
     case IND:
     {
         StgInd *ind = (StgInd *) thunk;
-        if (check_in_nonmoving_heap(ind->indirectee)) {
-            push_closure(queue, ind->indirectee, NULL);
+        StgClosure *indirectee = ACQUIRE_LOAD(&ind->indirectee);
+        if (check_in_nonmoving_heap(indirectee)) {
+            push_closure(queue, indirectee, NULL);
         }
         break;
     }


=====================================
rts/sm/Storage.c
=====================================
@@ -596,8 +596,6 @@ lockCAF (StgRegTable *reg, StgIndStatic *caf)
     bh->indirectee = (StgClosure *)cap->r.rCurrentTSO;
     SET_HDR(bh, &stg_CAF_BLACKHOLE_info, caf->header.prof.ccs);
 
-    // RELEASE ordering to ensure that above writes are visible before we
-    // introduce reference as CAF indirectee.
     RELEASE_STORE(&caf->indirectee, (StgClosure *) bh);
     SET_INFO_RELEASE((StgClosure*)caf, &stg_IND_STATIC_info);
 


=====================================
testsuite/tests/codeGen/should_run/all.T
=====================================
@@ -222,7 +222,7 @@ test('T21186', normal, compile_and_run, [''])
 test('T20640a', normal, compile_and_run, [''])
 test('T20640b', normal, compile_and_run, [''])
 test('T22296',[only_ways(llvm_ways)
-              ,unless(arch('x86_64'), skip)],compile_and_run,[''])
+              ,unless(arch('x86_64') or arch('aarch64'), skip)],compile_and_run,[''])
 test('T22798', normal, compile_and_run, ['-fregs-graph'])
 test('CheckBoundsOK', normal, compile_and_run, ['-fcheck-prim-bounds'])
 


=====================================
testsuite/tests/simplCore/should_compile/T24370.hs
=====================================
@@ -0,0 +1,16 @@
+{-# LANGUAGE QuantifiedConstraints, UndecidableInstances #-}
+
+-- This gave "RULE left-hand side too complicated to desugar"
+-- in GHC 9.8
+
+module T24370 where
+
+f :: (Eq a, Eq a) => a -> b -> Int
+f = error "urk"
+
+{-# SPECIALISE f :: T Maybe -> b -> Int #-}
+
+instance (forall a. Eq a => Eq (f a)) => Eq (T f) where
+  a == b = False
+
+data T f = MkT (f Int)


=====================================
testsuite/tests/simplCore/should_compile/all.T
=====================================
@@ -495,3 +495,4 @@ test('T23567', [extra_files(['T23567A.hs'])], multimod_compile, ['T23567', '-O -
 test('T23938', [extra_files(['T23938A.hs'])], multimod_compile, ['T23938', '-O -v0'])
 test('T23952', [extra_files(['T23952a.hs'])], multimod_compile, ['T23952', '-v0 -O'])
 test('T23922a', normal, compile, ['-O'])
+test('T24370', normal, compile, ['-O'])


=====================================
testsuite/tests/unboxedsums/all.T
=====================================
@@ -38,7 +38,7 @@ test('T20859', normal, compile, [''])
 
 test('T22187',[only_ways(llvm_ways)],compile,[''])
 test('T22187_run',[only_ways(llvm_ways)
-                  ,unless(arch('x86_64'), skip)],compile_and_run,[''])
+                  ,unless(arch('x86_64') or arch('aarch64'), skip)],compile_and_run,[''])
 
 test('unpack_sums_1', normal, compile_and_run, ['-O'])
 test('unpack_sums_2', normal, compile, ['-O'])


=====================================
utils/genapply/Main.hs
=====================================
@@ -783,7 +783,11 @@ genApply regstatus args =
         text "case IND,",
         text "     IND_STATIC: {",
         nest 4 (vcat [
-          text "R1 = StgInd_indirectee(R1);",
+          -- N.B. annoyingly the %acquire syntax must place its result in a local register
+          -- as it is a Cmm prim call node.
+          text "P_ p;",
+          text "p = %acquire StgInd_indirectee(R1);",
+          text "R1 = p;",
             -- An indirection node might contain a tagged pointer
           text "goto again;"
          ]),



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/4b2654cf13dcc6bed12589b84393711342781e8f...99999dc773c517850a4281e766ac5ea1132ce288

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/4b2654cf13dcc6bed12589b84393711342781e8f...99999dc773c517850a4281e766ac5ea1132ce288
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240216/406053af/attachment-0001.html>


More information about the ghc-commits mailing list