[Git][ghc/ghc][wip/tsan/codegen] 12 commits: Extend documentation for Data.IORef

Ben Gamari (@bgamari) gitlab at gitlab.haskell.org
Mon Nov 21 18:53:30 UTC 2022



Ben Gamari pushed to branch wip/tsan/codegen at Glasgow Haskell Compiler / GHC


Commits:
f2f9ef07 by Bodigrim at 2022-11-20T18:39:30-05:00
Extend documentation for Data.IORef

- - - - -
ef511b23 by Simon Peyton Jones at 2022-11-20T18:40:05-05:00
Buglet in GHC.Tc.Module.checkBootTyCon

This lurking bug used the wrong function to compare two
types in GHC.Tc.Module.checkBootTyCon

It's hard to trigger the bug, which only came up during
!9343, so there's no regression test in this MR.

- - - - -
451aeac3 by Bodigrim at 2022-11-20T18:40:44-05:00
Add since pragmas for c_interruptible_open and hostIsThreaded

- - - - -
e38d6d99 by Ben Gamari at 2022-11-21T13:30:23-05:00
hadrian: Don't enable TSAN in stage0 build

- - - - -
32a5ab8d by Ben Gamari at 2022-11-21T13:30:23-05:00
cmm: Introduce blockConcat

- - - - -
32a88875 by Ben Gamari at 2022-11-21T13:30:23-05:00
cmm: Introduce MemoryOrderings

- - - - -
6ed62022 by Ben Gamari at 2022-11-21T13:30:23-05:00
llvm: Respect memory specified orderings

- - - - -
64dd0684 by Ben Gamari at 2022-11-21T13:30:23-05:00
Codegen/x86: Eliminate barrier for relaxed accesses

- - - - -
ae9e0ba7 by Ben Gamari at 2022-11-21T13:30:23-05:00
cmm/Parser: Reduce some repetition

- - - - -
d1a0cfff by Ben Gamari at 2022-11-21T13:30:23-05:00
cmm/Parser: Add syntax for ordered loads and stores

- - - - -
1672d2ff by Ben Gamari at 2022-11-21T13:30:23-05:00
cmm/Parser: Atomic load syntax

Originally I had thought I would just use the `prim` call syntax instead
of introducing new syntax for atomic loads. However, it turns out that
`prim` call syntax tends to make things quite unreadable. This new
syntax seems quite natural.

- - - - -
9f545dc8 by Ben Gamari at 2022-11-21T13:53:15-05:00
codeGen: Introduce ThreadSanitizer instrumentation

This introduces a new Cmm pass which instruments the program with
ThreadSanitizer annotations, allowing full tracking of mutator memory
accesses via TSAN.

- - - - -


29 changed files:

- compiler/GHC/Cmm/Config.hs
- compiler/GHC/Cmm/ContFlowOpt.hs
- compiler/GHC/Cmm/Dataflow/Block.hs
- compiler/GHC/Cmm/Lexer.x
- compiler/GHC/Cmm/MachOp.hs
- compiler/GHC/Cmm/Parser.y
- compiler/GHC/Cmm/Pipeline.hs
- + compiler/GHC/Cmm/ThreadSanitizer.hs
- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/PPC/CodeGen.hs
- compiler/GHC/CmmToAsm/Wasm/FromCmm.hs
- compiler/GHC/CmmToAsm/X86/CodeGen.hs
- compiler/GHC/CmmToC.hs
- compiler/GHC/CmmToLlvm/CodeGen.hs
- compiler/GHC/Driver/Config/Cmm.hs
- compiler/GHC/Driver/Flags.hs
- compiler/GHC/Driver/Session.hs
- compiler/GHC/StgToCmm/ExtCode.hs
- compiler/GHC/StgToCmm/Prim.hs
- compiler/GHC/Tc/Module.hs
- compiler/ghc.cabal.in
- docs/users_guide/debugging.rst
- hadrian/src/Flavour.hs
- libraries/base/Data/IORef.hs
- libraries/base/GHC/IORef.hs
- libraries/base/System/Posix/Internals.hs
- + rts/TSANUtils.c
- rts/include/rts/TSANUtils.h
- rts/rts.cabal.in


Changes:

=====================================
compiler/GHC/Cmm/Config.hs
=====================================
@@ -19,6 +19,7 @@ data CmmConfig = CmmConfig
   , cmmDoLinting           :: !Bool    -- ^ Do Cmm Linting Optimization or not
   , cmmOptElimCommonBlks   :: !Bool    -- ^ Eliminate common blocks or not
   , cmmOptSink             :: !Bool    -- ^ Perform sink after stack layout or not
+  , cmmOptThreadSanitizer  :: !Bool    -- ^ Instrument memory accesses for ThreadSanitizer
   , cmmGenStackUnwindInstr :: !Bool    -- ^ Generate stack unwinding instructions (for debugging)
   , cmmExternalDynamicRefs :: !Bool    -- ^ Generate code to link against dynamic libraries
   , cmmDoCmmSwitchPlans    :: !Bool    -- ^ Should the Cmm pass replace Stg switch statements


=====================================
compiler/GHC/Cmm/ContFlowOpt.hs
=====================================
@@ -11,7 +11,7 @@ where
 
 import GHC.Prelude hiding (succ, unzip, zip)
 
-import GHC.Cmm.Dataflow.Block
+import GHC.Cmm.Dataflow.Block hiding (blockConcat)
 import GHC.Cmm.Dataflow.Collections
 import GHC.Cmm.Dataflow.Graph
 import GHC.Cmm.Dataflow.Label


=====================================
compiler/GHC/Cmm/Dataflow/Block.hs
=====================================
@@ -14,6 +14,7 @@ module GHC.Cmm.Dataflow.Block
     , IndexedCO
     , Block(..)
     , blockAppend
+    , blockConcat
     , blockCons
     , blockFromList
     , blockJoin
@@ -136,6 +137,8 @@ blockJoin f b t = BlockCC f b t
 blockAppend :: Block n e O -> Block n O x -> Block n e x
 blockAppend = cat
 
+blockConcat :: [Block n O O] -> Block n O O
+blockConcat = foldr blockAppend emptyBlock
 
 -- Taking apart
 


=====================================
compiler/GHC/Cmm/Lexer.x
=====================================
@@ -94,6 +94,10 @@ $white_no_nl+           ;
   "!="                  { kw CmmT_Ne }
   "&&"                  { kw CmmT_BoolAnd }
   "||"                  { kw CmmT_BoolOr }
+  "%relaxed"            { kw CmmT_Relaxed }
+  "%acquire"            { kw CmmT_Acquire }
+  "%release"            { kw CmmT_Release }
+  "%seq_cst"            { kw CmmT_SeqCst  }
 
   "True"                { kw CmmT_True  }
   "False"               { kw CmmT_False }
@@ -183,6 +187,10 @@ data CmmToken
   | CmmT_False
   | CmmT_True
   | CmmT_likely
+  | CmmT_Relaxed
+  | CmmT_Acquire
+  | CmmT_Release
+  | CmmT_SeqCst
   deriving (Show)
 
 -- -----------------------------------------------------------------------------


=====================================
compiler/GHC/Cmm/MachOp.hs
=====================================
@@ -24,6 +24,7 @@ module GHC.Cmm.MachOp
     , machOpMemcpyishAlign
 
     -- Atomic read-modify-write
+    , MemoryOrdering(..)
     , AtomicMachOp(..)
    )
 where
@@ -666,10 +667,12 @@ data CallishMachOp
   | MO_BSwap Width
   | MO_BRev Width
 
-  -- Atomic read-modify-write.
+  -- | Atomic read-modify-write. Arguments are @[dest, n]@.
   | MO_AtomicRMW Width AtomicMachOp
-  | MO_AtomicRead Width
-  | MO_AtomicWrite Width
+  -- | Atomic read. Arguments are @[addr]@.
+  | MO_AtomicRead Width MemoryOrdering
+  -- | Atomic write. Arguments are @[addr, value]@.
+  | MO_AtomicWrite Width MemoryOrdering
   -- | Atomic compare-and-swap. Arguments are @[dest, expected, new]@.
   -- Sequentially consistent.
   -- Possible future refactoring: should this be an'MO_AtomicRMW' variant?
@@ -684,6 +687,14 @@ data CallishMachOp
   | MO_ResumeThread
   deriving (Eq, Show)
 
+-- | C11 memory ordering semantics.
+data MemoryOrdering
+  = MemOrderRelaxed  -- ^ relaxed ordering
+  | MemOrderAcquire  -- ^ acquire ordering
+  | MemOrderRelease  -- ^ release ordering
+  | MemOrderSeqCst   -- ^ sequentially consistent
+  deriving (Eq, Ord, Show)
+
 -- | The operation to perform atomically.
 data AtomicMachOp =
       AMO_Add


=====================================
compiler/GHC/Cmm/Parser.y
=====================================
@@ -194,6 +194,27 @@ convention. Note if a field is longer than a word (e.g. a D_ on
 a 32-bit machine) then the call will push as many words as
 necessary to the stack to accommodate it (e.g. 2).
 
+Memory ordering
+---------------
+
+Cmm respects the C11 memory model and distinguishes between non-atomic and
+atomic memory accesses. In C11 fashion, atomic accesses can provide a number of
+memory ordering guarantees. These are supported in Cmm syntax as follows:
+
+    W_[ptr] = ...;            // a non-atomic store
+    %relaxed W_[ptr] = ...;   // an atomic store with relaxed ordering semantics
+    %release W_[ptr] = ...;   // an atomic store with release ordering semantics
+
+    x = W_(ptr);              // a non-atomic load
+    x = %relaxed W_[ptr];     // an atomic load with relaxed ordering
+    x = %acquire W_[ptr];     // an atomic load with acquire ordering
+    // or equivalently...
+    x = prim %load_acquire64(ptr);
+
+Here we used W_ as an example but these operations can be used on all Cmm
+types.
+
+See Note [Heap memory barriers] in SMP.h for details.
 
 ----------------------------------------------------------------------------- -}
 
@@ -313,6 +334,10 @@ import qualified Data.ByteString.Char8 as BS8
         'True'  { L _ (CmmT_True ) }
         'False' { L _ (CmmT_False) }
         'likely'{ L _ (CmmT_likely)}
+        'relaxed'{ L _ (CmmT_Relaxed)}
+        'acquire'{ L _ (CmmT_Acquire)}
+        'release'{ L _ (CmmT_Release)}
+        'seq_cst'{ L _ (CmmT_SeqCst)}
 
         'CLOSURE'       { L _ (CmmT_CLOSURE) }
         'INFO_TABLE'    { L _ (CmmT_INFO_TABLE) }
@@ -627,8 +652,23 @@ stmt    :: { CmmParse () }
 
         | lreg '=' expr ';'
                 { do reg <- $1; e <- $3; withSourceNote $2 $4 (emitAssign reg e) }
+
+        -- Use lreg instead of local_reg to avoid ambiguity
+        | lreg '=' mem_ordering type '[' expr ']' ';'
+                { do reg <- $1;
+                     let lreg = case reg of
+                                  { CmmLocal r -> r
+                                  ; other -> pprPanic "CmmParse:" (ppr reg <> text "not a local register")
+                                  } ;
+                     mord <- $3;
+                     let { ty = $4; w = typeWidth ty };
+                     e <- $6;
+                     let op = MO_AtomicRead w mord;
+                     withSourceNote $2 $7 $ code (emitPrimCall [lreg] op [e]) }
+        | mem_ordering type '[' expr ']' '=' expr ';'
+                { do mord <- $1; withSourceNote $3 $8 (doStore (Just mord) $2 $4 $7) }
         | type '[' expr ']' '=' expr ';'
-                { withSourceNote $2 $7 (doStore $1 $3 $6) }
+                { withSourceNote $2 $7 (doStore Nothing $1 $3 $6) }
 
         -- Gah! We really want to say "foreign_results" but that causes
         -- a shift/reduce conflict with assignment.  We either
@@ -678,6 +718,14 @@ unwind_regs
         | GLOBALREG '=' expr_or_unknown
                 { do e <- $3; return [($1, e)] }
 
+-- | A memory ordering
+mem_ordering :: { CmmParse MemoryOrdering }
+mem_ordering
+        : 'relaxed' { do return MemOrderRelaxed }
+        | 'release' { do return MemOrderRelease }
+        | 'acquire' { do return MemOrderAcquire }
+        | 'seq_cst' { do return MemOrderSeqCst }
+
 -- | Used by unwind to indicate unknown unwinding values.
 expr_or_unknown
         :: { CmmParse (Maybe CmmExpr) }
@@ -953,6 +1001,7 @@ exprMacros profile align_check = listToUFM [
     platform = profilePlatform profile
 
 -- we understand a subset of C-- primitives:
+machOps :: UniqFM FastString (Width -> MachOp)
 machOps = listToUFM $
         map (\(x, y) -> (mkFastString x, y)) [
         ( "add",        MO_Add ),
@@ -1074,37 +1123,32 @@ callishMachOps platform = listToUFM $
         ( "suspendThread", (MO_SuspendThread,) ),
         ( "resumeThread",  (MO_ResumeThread,) ),
 
-        ("prefetch0", (MO_Prefetch_Data 0,)),
-        ("prefetch1", (MO_Prefetch_Data 1,)),
-        ("prefetch2", (MO_Prefetch_Data 2,)),
-        ("prefetch3", (MO_Prefetch_Data 3,)),
-
-        ( "popcnt8",  (MO_PopCnt W8,)),
-        ( "popcnt16", (MO_PopCnt W16,)),
-        ( "popcnt32", (MO_PopCnt W32,)),
-        ( "popcnt64", (MO_PopCnt W64,)),
-
-        ( "pdep8",  (MO_Pdep W8,)),
-        ( "pdep16", (MO_Pdep W16,)),
-        ( "pdep32", (MO_Pdep W32,)),
-        ( "pdep64", (MO_Pdep W64,)),
-
-        ( "pext8",  (MO_Pext W8,)),
-        ( "pext16", (MO_Pext W16,)),
-        ( "pext32", (MO_Pext W32,)),
-        ( "pext64", (MO_Pext W64,)),
-
-        ( "cmpxchg8",  (MO_Cmpxchg W8,)),
-        ( "cmpxchg16", (MO_Cmpxchg W16,)),
-        ( "cmpxchg32", (MO_Cmpxchg W32,)),
-        ( "cmpxchg64", (MO_Cmpxchg W64,)),
-
-        ( "xchg8",  (MO_Xchg W8,)),
-        ( "xchg16", (MO_Xchg W16,)),
-        ( "xchg32", (MO_Xchg W32,)),
-        ( "xchg64", (MO_Xchg W64,))
+        ( "prefetch0", (MO_Prefetch_Data 0,)),
+        ( "prefetch1", (MO_Prefetch_Data 1,)),
+        ( "prefetch2", (MO_Prefetch_Data 2,)),
+        ( "prefetch3", (MO_Prefetch_Data 3,))
+    ] ++ concat
+    [ allWidths "popcnt" MO_PopCnt
+    , allWidths "pdep" MO_Pdep
+    , allWidths "pext" MO_Pext
+    , allWidths "cmpxchg" MO_Cmpxchg
+    , allWidths "xchg" MO_Xchg
+    , allWidths "load_relaxed" (\w -> MO_AtomicRead w MemOrderAcquire)
+    , allWidths "load_acquire" (\w -> MO_AtomicRead w MemOrderAcquire)
+    , allWidths "load_seqcst" (\w -> MO_AtomicRead w MemOrderSeqCst)
+    , allWidths "store_release" (\w -> MO_AtomicWrite w MemOrderRelease)
+    , allWidths "store_seqcst" (\w -> MO_AtomicWrite w MemOrderSeqCst)
     ]
   where
+    allWidths
+        :: String
+        -> (Width -> CallishMachOp)
+        -> [(FastString, a -> (CallishMachOp, a))]
+    allWidths name f =
+        [ (mkFastString $ name ++ show (widthInBits w), (f w,))
+        | w <- [W8, W16, W32, W64]
+        ]
+
     memcpyLikeTweakArgs :: (Int -> CallishMachOp) -> [CmmExpr] -> (CallishMachOp, [CmmExpr])
     memcpyLikeTweakArgs op [] = pgmError "memcpy-like function requires at least one argument"
     memcpyLikeTweakArgs op args@(_:_) =
@@ -1348,8 +1392,12 @@ primCall results_code name args_code
                 let (p, args') = f args
                 code (emitPrimCall (map fst results) p args')
 
-doStore :: CmmType -> CmmParse CmmExpr  -> CmmParse CmmExpr -> CmmParse ()
-doStore rep addr_code val_code
+doStore :: Maybe MemoryOrdering
+        -> CmmType
+        -> CmmParse CmmExpr   -- ^ address
+        -> CmmParse CmmExpr   -- ^ value
+        -> CmmParse ()
+doStore mem_ord rep addr_code val_code
   = do platform <- getPlatform
        addr <- addr_code
        val <- val_code
@@ -1363,7 +1411,7 @@ doStore rep addr_code val_code
        let coerce_val
                 | val_width /= rep_width = CmmMachOp (MO_UU_Conv val_width rep_width) [val]
                 | otherwise              = val
-       emitStore addr coerce_val
+       emitStore mem_ord addr coerce_val
 
 -- -----------------------------------------------------------------------------
 -- If-then-else and boolean expressions


=====================================
compiler/GHC/Cmm/Pipeline.hs
=====================================
@@ -19,6 +19,7 @@ import GHC.Cmm.LayoutStack
 import GHC.Cmm.ProcPoint
 import GHC.Cmm.Sink
 import GHC.Cmm.Switch.Implement
+import GHC.Cmm.ThreadSanitizer
 
 import GHC.Types.Unique.Supply
 
@@ -98,6 +99,13 @@ cpsTop logger platform cfg proc =
              else pure g
       dump Opt_D_dump_cmm_switch "Post switch plan" g
 
+      ----------- ThreadSanitizer instrumentation -----------------------------
+      g <- {-# SCC "annotateTSAN" #-} return $
+          if cmmOptThreadSanitizer cfg
+          then annotateTSAN platform g
+          else g
+      dump Opt_D_dump_cmm_thread_sanitizer "ThreadSanitizer instrumentation" g
+
       ----------- Proc points -------------------------------------------------
       let
         call_pps :: ProcPointSet -- LabelMap


=====================================
compiler/GHC/Cmm/ThreadSanitizer.hs
=====================================
@@ -0,0 +1,254 @@
+{-# LANGUAGE GADTs #-}
+{-# LANGUAGE RankNTypes #-}
+
+-- | Annotate a CmmGraph with ThreadSanitizer instrumentation calls.
+module GHC.Cmm.ThreadSanitizer (annotateTSAN) where
+
+import GHC.Prelude
+
+import GHC.StgToCmm.Utils (get_GlobalReg_addr)
+import GHC.Platform
+import GHC.Platform.Regs (activeStgRegs, callerSaves)
+import GHC.Cmm
+import GHC.Cmm.Utils
+import GHC.Cmm.CLabel
+import GHC.Cmm.Dataflow
+import GHC.Cmm.Dataflow.Block
+import GHC.Cmm.Dataflow.Graph
+import GHC.Data.FastString
+import GHC.Types.Basic
+import GHC.Types.ForeignCall
+
+import Data.Maybe (fromMaybe)
+
+annotateTSAN :: Platform -> CmmGraph -> CmmGraph
+annotateTSAN platform graph =
+    modifyGraph (mapGraphBlocks (annotateBlock platform)) graph
+
+mapBlockList :: (forall e' x'. n e' x' -> Block n e' x')
+             -> Block n e x -> Block n e x
+mapBlockList f (BlockCO n rest  ) = f n `blockAppend` mapBlockList f rest
+mapBlockList f (BlockCC n rest m) = f n `blockAppend` mapBlockList f rest `blockAppend` f m
+mapBlockList f (BlockOC   rest m) = mapBlockList f rest `blockAppend` f m
+mapBlockList _ BNil = BNil
+mapBlockList f (BMiddle blk) = f blk
+mapBlockList f (BCat a b) = mapBlockList f a `blockAppend` mapBlockList f b
+mapBlockList f (BSnoc a n) = mapBlockList f a `blockAppend` f n
+mapBlockList f (BCons n a) = f n `blockAppend` mapBlockList f a
+
+annotateBlock :: Platform -> Block CmmNode e x -> Block CmmNode e x
+annotateBlock platform = mapBlockList (annotateNode platform)
+
+annotateNode :: Platform -> CmmNode e x -> Block CmmNode e x
+annotateNode platform node =
+    case node of
+      CmmEntry{}              -> BlockCO node BNil
+      CmmComment{}            -> BMiddle node
+      CmmTick{}               -> BMiddle node
+      CmmUnwind{}             -> BMiddle node
+      CmmAssign{}             -> annotateNodeOO platform node
+      CmmStore lhs rhs align  ->
+          let ty = cmmExprType platform rhs
+              rhs_nodes = annotateLoads platform (collectExprLoads rhs)
+              lhs_nodes = annotateLoads platform (collectExprLoads lhs)
+              st        = tsanStore platform align ty lhs
+          in rhs_nodes `blockAppend` lhs_nodes `blockAppend` st `blockSnoc` node
+      CmmUnsafeForeignCall (PrimTarget op) formals args ->
+          let node' = fromMaybe (BMiddle node) (annotatePrim platform op formals args)
+              arg_nodes = blockConcat $ map (annotateExpr platform) args
+          in arg_nodes `blockAppend` node'
+      CmmUnsafeForeignCall{}  -> annotateNodeOO platform node
+      CmmBranch{}             -> annotateNodeOC platform node
+      CmmCondBranch{}         -> annotateNodeOC platform node
+      CmmSwitch{}             -> annotateNodeOC platform node
+      CmmCall{}               -> annotateNodeOC platform node
+      CmmForeignCall{}        -> annotateNodeOC platform node
+
+annotateNodeOO :: Platform -> CmmNode O O -> Block CmmNode O O
+annotateNodeOO platform node =
+    annotateLoads platform (collectLoadsNode node) `blockSnoc` node
+
+annotateNodeOC :: Platform -> CmmNode O C -> Block CmmNode O C
+annotateNodeOC platform node =
+    annotateLoads platform (collectLoadsNode node) `blockJoinTail` node
+
+annotateExpr :: Platform -> CmmExpr -> Block CmmNode O O
+annotateExpr platform expr =
+    annotateLoads platform (collectExprLoads expr)
+
+data Load = Load CmmType AlignmentSpec CmmExpr
+
+annotateLoads :: Platform -> [Load] -> Block CmmNode O O
+annotateLoads platform loads =
+    blockConcat
+    [ tsanLoad platform align ty addr
+    | Load ty align addr <- loads
+    ]
+
+collectLoadsNode :: CmmNode e x -> [Load]
+collectLoadsNode node =
+    foldExp (\exp rest -> collectExprLoads exp ++ rest) node []
+
+-- | Collect all of the memory locations loaded from by a 'CmmExpr'.
+collectExprLoads :: CmmExpr -> [Load]
+collectExprLoads (CmmLit _)           = []
+collectExprLoads (CmmLoad e ty align) = [Load ty align e]
+collectExprLoads (CmmReg _)           = []
+collectExprLoads (CmmMachOp _op args) = foldMap collectExprLoads args
+collectExprLoads (CmmStackSlot _ _)   = []
+collectExprLoads (CmmRegOff _ _)      = []
+
+-- | Generate TSAN instrumentation for a 'CallishMachOp' occurrence.
+annotatePrim :: Platform -> CallishMachOp -> [CmmFormal] -> [CmmActual] -> Maybe (Block CmmNode O O)
+annotatePrim plat (MO_AtomicRMW w aop)    [dest]   [addr, val] = Just $ tsanAtomicRMW plat MemOrderSeqCst aop w addr val dest
+annotatePrim plat (MO_AtomicRead w mord)  [dest]   [addr]      = Just $ tsanAtomicLoad plat mord w addr dest
+annotatePrim plat (MO_AtomicWrite w mord) []       [addr, val] = Just $ tsanAtomicStore plat mord w val addr
+annotatePrim plat (MO_Xchg w)             [dest]   [addr, val] = Just $ tsanAtomicExchange plat MemOrderSeqCst w val addr dest
+annotatePrim plat (MO_Cmpxchg w)          [dest]   [addr, expected, new]
+                                                               = Just $ tsanAtomicCas plat MemOrderSeqCst MemOrderSeqCst w addr expected new dest
+annotatePrim _    _                       _        _           = Nothing
+
+mkUnsafeCall :: Platform
+             -> ForeignTarget  -- ^ function
+             -> [CmmFormal]    -- ^ results
+             -> [CmmActual]    -- arguments
+             -> Block CmmNode O O
+mkUnsafeCall platform ftgt formals args =
+    -- We are rather conservative here and just save/restore all GlobalRegs.
+    let (save, restore) = saveRestoreCallerRegs platform
+        call = CmmUnsafeForeignCall ftgt formals args
+    in save `blockSnoc` call `blockAppend` restore
+
+saveRestoreCallerRegs :: Platform
+                      -> (Block CmmNode O O, Block CmmNode O O)
+saveRestoreCallerRegs platform =
+    (save, restore)
+  where
+    regs = filter (callerSaves platform) (activeStgRegs platform)
+
+    save = blockFromList (map saveReg regs)
+    saveReg reg =
+      CmmStore (get_GlobalReg_addr platform reg)
+               (CmmReg (CmmGlobal reg))
+               NaturallyAligned
+
+    restore = blockFromList (map restoreReg regs)
+    restoreReg reg =
+      CmmAssign (CmmGlobal reg)
+                (CmmLoad (get_GlobalReg_addr platform reg)
+                         (globalRegType platform reg)
+                         NaturallyAligned)
+
+-- | Mirrors __tsan_memory_order
+-- <https://github.com/llvm-mirror/compiler-rt/blob/master/include/sanitizer/tsan_interface_atomic.h#L32>
+memoryOrderToTsanMemoryOrder :: MemoryOrdering -> Int
+memoryOrderToTsanMemoryOrder mord =
+    case mord of
+      MemOrderRelaxed -> 0
+      MemOrderAcquire -> 2
+      MemOrderRelease -> 3
+      MemOrderSeqCst  -> 5
+
+tsanTarget :: FastString     -- ^ function name
+           -> [ForeignHint]  -- ^ formals
+           -> [ForeignHint]  -- ^ arguments
+           -> ForeignTarget
+tsanTarget fn formals args =
+    ForeignTarget (CmmLit (CmmLabel lbl)) conv
+  where
+    conv = ForeignConvention CCallConv args formals CmmMayReturn
+    lbl = mkForeignLabel fn Nothing ForeignLabelInExternalPackage IsFunction
+
+tsanStore :: Platform
+          -> AlignmentSpec -> CmmType -> CmmExpr
+          -> Block CmmNode O O
+tsanStore platform align ty addr =
+    mkUnsafeCall platform ftarget [] [addr]
+  where
+    ftarget = tsanTarget fn [] [AddrHint]
+    w = widthInBytes (typeWidth ty)
+    fn = case align of
+           Unaligned
+             | w > 1    -> fsLit $ "__tsan_unaligned_write" ++ show w
+           _            -> fsLit $ "__tsan_write" ++ show w
+
+tsanLoad :: Platform
+         -> AlignmentSpec -> CmmType -> CmmExpr
+         -> Block CmmNode O O
+tsanLoad platform align ty addr =
+    mkUnsafeCall platform ftarget [] [addr]
+  where
+    ftarget = tsanTarget fn [] [AddrHint]
+    w = widthInBytes (typeWidth ty)
+    fn = case align of
+           Unaligned
+             | w > 1    -> fsLit $ "__tsan_unaligned_read" ++ show w
+           _            -> fsLit $ "__tsan_read" ++ show w
+
+tsanAtomicStore :: Platform
+                -> MemoryOrdering -> Width -> CmmExpr -> CmmExpr
+                -> Block CmmNode O O
+tsanAtomicStore platform mord w val addr =
+    mkUnsafeCall platform ftarget [] [addr, val, mord']
+  where
+    mord' = mkIntExpr platform $ memoryOrderToTsanMemoryOrder mord
+    ftarget = tsanTarget fn [] [AddrHint, NoHint, NoHint]
+    fn = fsLit $ "__tsan_atomic" ++ show (widthInBits w) ++ "_store"
+
+tsanAtomicLoad :: Platform
+               -> MemoryOrdering -> Width -> CmmExpr -> LocalReg
+               -> Block CmmNode O O
+tsanAtomicLoad platform mord w addr dest =
+    mkUnsafeCall platform ftarget [dest] [addr, mord']
+  where
+    mord' = mkIntExpr platform $ memoryOrderToTsanMemoryOrder mord
+    ftarget = tsanTarget fn [NoHint] [AddrHint, NoHint]
+    fn = fsLit $ "__tsan_atomic" ++ show (widthInBits w) ++ "_load"
+
+tsanAtomicExchange :: Platform
+                   -> MemoryOrdering -> Width -> CmmExpr -> CmmExpr -> LocalReg
+                   -> Block CmmNode O O
+tsanAtomicExchange platform mord w val addr dest =
+    mkUnsafeCall platform ftarget [dest] [addr, val, mord']
+  where
+    mord' = mkIntExpr platform $ memoryOrderToTsanMemoryOrder mord
+    ftarget = tsanTarget fn [NoHint] [AddrHint, NoHint, NoHint]
+    fn = fsLit $ "__tsan_atomic" ++ show (widthInBits w) ++ "_exchange"
+
+-- N.B. C11 CAS returns a boolean (to avoid the ABA problem) whereas Cmm's CAS
+-- returns the expected value. We use define a shim in the RTS to provide
+-- Cmm's semantics using the TSAN C11 primitive.
+tsanAtomicCas :: Platform
+              -> MemoryOrdering  -- ^ success ordering
+              -> MemoryOrdering  -- ^ failure ordering
+              -> Width
+              -> CmmExpr         -- ^ address
+              -> CmmExpr         -- ^ expected value
+              -> CmmExpr         -- ^ new value
+              -> LocalReg        -- ^ result destination
+              -> Block CmmNode O O
+tsanAtomicCas platform mord_success mord_failure w addr expected new dest =
+    mkUnsafeCall platform ftarget [dest] [addr, expected, new, mord_success', mord_failure']
+  where
+    mord_success' = mkIntExpr platform $ memoryOrderToTsanMemoryOrder mord_success
+    mord_failure' = mkIntExpr platform $ memoryOrderToTsanMemoryOrder mord_failure
+    ftarget = tsanTarget fn [NoHint] [AddrHint, NoHint, NoHint, NoHint, NoHint]
+    fn = fsLit $ "ghc_tsan_atomic" ++ show (widthInBits w) ++ "_compare_exchange"
+
+tsanAtomicRMW :: Platform
+              -> MemoryOrdering -> AtomicMachOp -> Width -> CmmExpr -> CmmExpr -> LocalReg
+              -> Block CmmNode O O
+tsanAtomicRMW platform mord op w addr val dest =
+    mkUnsafeCall platform ftarget [dest] [addr, val, mord']
+  where
+    mord' = mkIntExpr platform $ memoryOrderToTsanMemoryOrder mord
+    ftarget = tsanTarget fn [NoHint] [AddrHint, NoHint, NoHint]
+    op' = case op of
+           AMO_Add  -> "fetch_add"
+           AMO_Sub  -> "fetch_sub"
+           AMO_And  -> "fetch_and"
+           AMO_Nand -> "fetch_nand"
+           AMO_Or   -> "fetch_or"
+           AMO_Xor  -> "fetch_xor"
+    fn = fsLit $ "__tsan_atomic" ++ show (widthInBits w) ++ "_" ++ op'
+


=====================================
compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
=====================================
@@ -1537,8 +1537,8 @@ genCCall target dest_regs arg_regs bid = do
 
         -- -- Atomic read-modify-write.
         MO_AtomicRMW w amop -> mkCCall (atomicRMWLabel w amop)
-        MO_AtomicRead w     -> mkCCall (atomicReadLabel w)
-        MO_AtomicWrite w    -> mkCCall (atomicWriteLabel w)
+        MO_AtomicRead w _   -> mkCCall (atomicReadLabel w)
+        MO_AtomicWrite w _  -> mkCCall (atomicWriteLabel w)
         MO_Cmpxchg w        -> mkCCall (cmpxchgLabel w)
         -- -- Should be an AtomicRMW variant eventually.
         -- -- Sequential consistent.


=====================================
compiler/GHC/CmmToAsm/PPC/CodeGen.hs
=====================================
@@ -1173,7 +1173,7 @@ genCCall (PrimTarget (MO_AtomicRMW width amop)) [dst] [addr, n]
                           (n_reg, n_code) <- getSomeReg n
                           return  (op dst dst (RIReg n_reg), n_code)
 
-genCCall (PrimTarget (MO_AtomicRead width)) [dst] [addr]
+genCCall (PrimTarget (MO_AtomicRead width _)) [dst] [addr]
  = do let fmt      = intFormat width
           reg_dst  = getLocalRegReg dst
           form     = if widthInBits width == 64 then DS else D
@@ -1200,7 +1200,7 @@ genCCall (PrimTarget (MO_AtomicRead width)) [dst] [addr]
 -- This is also what gcc does.
 
 
-genCCall (PrimTarget (MO_AtomicWrite width)) [] [addr, val] = do
+genCCall (PrimTarget (MO_AtomicWrite width _)) [] [addr, val] = do
     code <- assignMem_IntCode (intFormat width) addr val
     return $ unitOL HWSYNC `appOL` code
 
@@ -2067,8 +2067,8 @@ genCCall' config gcp target dest_regs args
                     MO_AtomicRMW {} -> unsupported
                     MO_Cmpxchg w -> (cmpxchgLabel w, False)
                     MO_Xchg w    -> (xchgLabel w, False)
-                    MO_AtomicRead _  -> unsupported
-                    MO_AtomicWrite _ -> unsupported
+                    MO_AtomicRead _ _  -> unsupported
+                    MO_AtomicWrite _ _ -> unsupported
 
                     MO_S_Mul2    {}  -> unsupported
                     MO_S_QuotRem {}  -> unsupported


=====================================
compiler/GHC/CmmToAsm/Wasm/FromCmm.hs
=====================================
@@ -1293,7 +1293,7 @@ lower_CallishMachOp lbl (MO_AtomicRMW w0 op) rs xs =
     CmmMayReturn
     rs
     xs
-lower_CallishMachOp lbl (MO_AtomicRead w0) [reg] [ptr] = do
+lower_CallishMachOp lbl (MO_AtomicRead w0 _) [reg] [ptr] = do
   SomeWasmExpr ty (WasmExpr ret_instr) <-
     lower_CmmLoad
       lbl
@@ -1302,7 +1302,7 @@ lower_CallishMachOp lbl (MO_AtomicRead w0) [reg] [ptr] = do
       NaturallyAligned
   ri <- onCmmLocalReg_Typed ty reg
   pure $ WasmStatements $ ret_instr `WasmConcat` WasmLocalSet ty ri
-lower_CallishMachOp lbl (MO_AtomicWrite _) [] [ptr, val] =
+lower_CallishMachOp lbl (MO_AtomicWrite _ _) [] [ptr, val] =
   lower_CmmStore lbl ptr val NaturallyAligned
 lower_CallishMachOp lbl (MO_Cmpxchg w0) rs xs = lower_MO_Cmpxchg lbl w0 rs xs
 lower_CallishMachOp lbl (MO_Xchg w0) rs xs =


=====================================
compiler/GHC/CmmToAsm/X86/CodeGen.hs
=====================================
@@ -2203,8 +2203,8 @@ genSimplePrim bid (MO_Pdep width)      [dst]   [src,mask]     = genPdep bid widt
 genSimplePrim bid (MO_Pext width)      [dst]   [src,mask]     = genPext bid width dst src mask
 genSimplePrim bid (MO_Clz width)       [dst]   [src]          = genClz bid width dst src
 genSimplePrim bid (MO_UF_Conv width)   [dst]   [src]          = genWordToFloat bid width dst src
-genSimplePrim _   (MO_AtomicRead w)    [dst]   [addr]         = genAtomicRead w dst addr
-genSimplePrim _   (MO_AtomicWrite w)   []      [addr,val]     = genAtomicWrite w addr val
+genSimplePrim _   (MO_AtomicRead w mo)  [dst]  [addr]         = genAtomicRead w mo dst addr
+genSimplePrim _   (MO_AtomicWrite w mo) []     [addr,val]     = genAtomicWrite w mo addr val
 genSimplePrim bid (MO_Cmpxchg width)   [dst]   [addr,old,new] = genCmpXchg bid width dst addr old new
 genSimplePrim _   (MO_Xchg width)      [dst]   [addr, value]  = genXchg width dst addr value
 genSimplePrim _   (MO_AddWordC w)      [r,c]   [x,y]          = genAddSubRetCarry w ADD_CC (const Nothing) CARRY r c x y
@@ -3962,15 +3962,20 @@ genWordToFloat bid width dst src =
   -- TODO: generate assembly instead
   genPrimCCall bid (word2FloatLabel width) [dst] [src]
 
-genAtomicRead :: Width -> LocalReg -> CmmExpr -> NatM InstrBlock
-genAtomicRead width dst addr = do
+genAtomicRead :: Width -> MemoryOrdering -> LocalReg -> CmmExpr -> NatM InstrBlock
+genAtomicRead width _mord dst addr = do
   load_code <- intLoadCode (MOV (intFormat width)) addr
   return (load_code (getLocalRegReg dst))
 
-genAtomicWrite :: Width -> CmmExpr -> CmmExpr -> NatM InstrBlock
-genAtomicWrite width addr val = do
+genAtomicWrite :: Width -> MemoryOrdering -> CmmExpr -> CmmExpr -> NatM InstrBlock
+genAtomicWrite width mord addr val = do
   code <- assignMem_IntCode (intFormat width) addr val
-  return $ code `snocOL` MFENCE
+  let needs_fence = case mord of
+        MemOrderSeqCst  -> True
+        MemOrderRelease -> True
+        MemOrderAcquire -> pprPanic "genAtomicWrite: acquire ordering on write" empty
+        MemOrderRelaxed -> False
+  return $ if needs_fence then code `snocOL` MFENCE else code
 
 genCmpXchg
   :: BlockId


=====================================
compiler/GHC/CmmToC.hs
=====================================
@@ -949,8 +949,9 @@ pprCallishMachOp_for_C mop
         MO_AtomicRMW w amop -> ftext (atomicRMWLabel w amop)
         MO_Cmpxchg w        -> ftext (cmpxchgLabel w)
         MO_Xchg w           -> ftext (xchgLabel w)
-        MO_AtomicRead w     -> ftext (atomicReadLabel w)
-        MO_AtomicWrite w    -> ftext (atomicWriteLabel w)
+        -- TODO: handle orderings
+        MO_AtomicRead w _   -> ftext (atomicReadLabel w)
+        MO_AtomicWrite w _  -> ftext (atomicWriteLabel w)
         MO_UF_Conv w        -> ftext (word2FloatLabel w)
 
         MO_S_Mul2     {} -> unsupported


=====================================
compiler/GHC/CmmToLlvm/CodeGen.hs
=====================================
@@ -45,7 +45,7 @@ import qualified Data.Semigroup as Semigroup
 import Data.List ( nub )
 import Data.Maybe ( catMaybes )
 
-type Atomic = Bool
+type Atomic = Maybe MemoryOrdering
 type LlvmStatements = OrdList LlvmStatement
 
 data Signage = Signed | Unsigned deriving (Eq, Show)
@@ -265,9 +265,9 @@ genCall (PrimTarget (MO_AtomicRMW width amop)) [dst] [addr, n] = runStmtsDecls $
     retVar <- doExprW targetTy $ AtomicRMW op ptrVar nVar SyncSeqCst
     statement $ Store retVar dstVar Nothing
 
-genCall (PrimTarget (MO_AtomicRead _)) [dst] [addr] = runStmtsDecls $ do
+genCall (PrimTarget (MO_AtomicRead _ mem_ord)) [dst] [addr] = runStmtsDecls $ do
     dstV <- getCmmRegW (CmmLocal dst)
-    v1 <- genLoadW True addr (localRegType dst) NaturallyAligned
+    v1 <- genLoadW (Just mem_ord) addr (localRegType dst) NaturallyAligned
     statement $ Store v1 dstV Nothing
 
 genCall (PrimTarget (MO_Cmpxchg _width))
@@ -294,13 +294,14 @@ genCall (PrimTarget (MO_Xchg _width)) [dst] [addr, val] = runStmtsDecls $ do
     resVar <- doExprW (getVarType valVar) (AtomicRMW LAO_Xchg ptrVar valVar SyncSeqCst)
     statement $ Store resVar dstV Nothing
 
-genCall (PrimTarget (MO_AtomicWrite _width)) [] [addr, val] = runStmtsDecls $ do
+genCall (PrimTarget (MO_AtomicWrite _width mem_ord)) [] [addr, val] = runStmtsDecls $ do
     addrVar <- exprToVarW addr
     valVar <- exprToVarW val
     let ptrTy = pLift $ getVarType valVar
         ptrExpr = Cast LM_Inttoptr addrVar ptrTy
     ptrVar <- doExprW ptrTy ptrExpr
-    statement $ Expr $ AtomicRMW LAO_Xchg ptrVar valVar SyncSeqCst
+    let ordering = convertMemoryOrdering mem_ord
+    statement $ Expr $ AtomicRMW LAO_Xchg ptrVar valVar ordering
 
 -- Handle memcpy function specifically since llvm's intrinsic version takes
 -- some extra parameters.
@@ -1012,11 +1013,11 @@ cmmPrimOpFunctions mop = do
     MO_Touch         -> unsupported
     MO_UF_Conv _     -> unsupported
 
-    MO_AtomicRead _  -> unsupported
-    MO_AtomicRMW _ _ -> unsupported
-    MO_AtomicWrite _ -> unsupported
-    MO_Cmpxchg _     -> unsupported
-    MO_Xchg _        -> unsupported
+    MO_AtomicRead _ _  -> unsupported
+    MO_AtomicRMW _ _   -> unsupported
+    MO_AtomicWrite _ _ -> unsupported
+    MO_Cmpxchg _       -> unsupported
+    MO_Xchg _          -> unsupported
 
     MO_I64_ToI       -> dontReach64
     MO_I64_FromI     -> dontReach64
@@ -1368,7 +1369,7 @@ exprToVarOpt opt e = case e of
         -> genLit opt lit
 
     CmmLoad e' ty align
-        -> genLoad False e' ty align
+        -> genLoad Nothing e' ty align
 
     -- Cmmreg in expression is the value, so must load. If you want actual
     -- reg pointer, call getCmmReg directly.
@@ -1893,7 +1894,8 @@ case we will need a more granular way of specifying alignment.
 
 mkLoad :: Atomic -> LlvmVar -> AlignmentSpec -> LlvmExpression
 mkLoad atomic vptr alignment
-  | atomic      = ALoad SyncSeqCst False vptr
+  | Just mem_ord <- atomic
+                = ALoad (convertMemoryOrdering mem_ord) False vptr
   | otherwise   = Load vptr align
   where
     ty = pLower (getVarType vptr)
@@ -2030,6 +2032,12 @@ genLit _ CmmHighStackMark
 -- * Misc
 --
 
+convertMemoryOrdering :: MemoryOrdering -> LlvmSyncOrdering
+convertMemoryOrdering MemOrderRelaxed = SyncUnord
+convertMemoryOrdering MemOrderAcquire = SyncAcquire
+convertMemoryOrdering MemOrderRelease = SyncRelease
+convertMemoryOrdering MemOrderSeqCst  = SyncSeqCst
+
 -- | Find CmmRegs that get assigned and allocate them on the stack
 --
 -- Any register that gets written needs to be allocated on the


=====================================
compiler/GHC/Driver/Config/Cmm.hs
=====================================
@@ -18,6 +18,7 @@ initCmmConfig dflags = CmmConfig
   , cmmDoLinting           = gopt Opt_DoCmmLinting        dflags
   , cmmOptElimCommonBlks   = gopt Opt_CmmElimCommonBlocks dflags
   , cmmOptSink             = gopt Opt_CmmSink             dflags
+  , cmmOptThreadSanitizer  = gopt Opt_CmmThreadSanitizer dflags
   , cmmGenStackUnwindInstr = debugLevel dflags > 0
   , cmmExternalDynamicRefs = gopt Opt_ExternalDynamicRefs dflags
   , cmmDoCmmSwitchPlans    = not (backendHasNativeSwitch (backend dflags))


=====================================
compiler/GHC/Driver/Flags.hs
=====================================
@@ -66,6 +66,7 @@ data DumpFlag
    | Opt_D_dump_cmm_split
    | Opt_D_dump_cmm_info
    | Opt_D_dump_cmm_cps
+   | Opt_D_dump_cmm_thread_sanitizer
    -- end cmm subflags
    | Opt_D_dump_cfg_weights -- ^ Dump the cfg used for block layout.
    | Opt_D_dump_asm
@@ -352,6 +353,7 @@ data GeneralFlag
    | Opt_Ticky_Dyn_Thunk
    | Opt_Ticky_Tag
    | Opt_Ticky_AP                    -- ^ Use regular thunks even when we could use std ap thunks in order to get entry counts
+   | Opt_CmmThreadSanitizer
    | Opt_RPath
    | Opt_RelativeDynlibPaths
    | Opt_CompactUnwind               -- ^ @-fcompact-unwind@


=====================================
compiler/GHC/Driver/Session.hs
=====================================
@@ -2436,6 +2436,8 @@ dynamic_flags_deps = [
         (setDumpFlag Opt_D_dump_cmm_cps)
   , make_ord_flag defGhcFlag "ddump-cmm-opt"
         (setDumpFlag Opt_D_dump_opt_cmm)
+  , make_ord_flag defGhcFlag "ddump-cmm-thread-sanitizer"
+        (setDumpFlag Opt_D_dump_cmm_thread_sanitizer)
   , make_ord_flag defGhcFlag "ddump-cfg-weights"
         (setDumpFlag Opt_D_dump_cfg_weights)
   , make_ord_flag defGhcFlag "ddump-core-stats"
@@ -3509,8 +3511,8 @@ fFlagsDeps = [
         unless (platformOS (targetPlatform dflags) == OSDarwin && turn_on)
                (addWarn "-compact-unwind is only implemented by the darwin platform. Ignoring.")
         return dflags)),
-  flagSpec "show-error-context"               Opt_ShowErrorContext
-
+  flagSpec "show-error-context"               Opt_ShowErrorContext,
+  flagSpec "cmm-thread-sanitizer"             Opt_CmmThreadSanitizer
   ]
   ++ fHoleFlags
 


=====================================
compiler/GHC/StgToCmm/ExtCode.hs
=====================================
@@ -231,8 +231,12 @@ emitLabel = code . F.emitLabel
 emitAssign :: CmmReg  -> CmmExpr -> CmmParse ()
 emitAssign l r = code (F.emitAssign l r)
 
-emitStore :: CmmExpr  -> CmmExpr -> CmmParse ()
-emitStore l r = code (F.emitStore l r)
+emitStore :: Maybe MemoryOrdering -> CmmExpr  -> CmmExpr -> CmmParse ()
+emitStore (Just mem_ord) l r = do
+  platform <- getPlatform
+  let w = typeWidth $ cmmExprType platform r
+  emit $ mkUnsafeCall (PrimTarget $ MO_AtomicWrite w mem_ord) [] [l,r]
+emitStore Nothing l r = code (F.emitStore l r)
 
 getCode :: CmmParse a -> CmmParse CmmAGraph
 getCode (EC ec) = EC $ \c e s -> do


=====================================
compiler/GHC/StgToCmm/Prim.hs
=====================================
@@ -3011,7 +3011,7 @@ doAtomicReadAddr
 doAtomicReadAddr res addr ty =
     emitPrimCall
         [ res ]
-        (MO_AtomicRead (typeWidth ty))
+        (MO_AtomicRead (typeWidth ty) MemOrderSeqCst)
         [ addr ]
 
 -- | Emit an atomic write to a byte array that acts as a memory barrier.
@@ -3039,7 +3039,7 @@ doAtomicWriteAddr
 doAtomicWriteAddr addr ty val =
     emitPrimCall
         [ {- no results -} ]
-        (MO_AtomicWrite (typeWidth ty))
+        (MO_AtomicWrite (typeWidth ty) MemOrderSeqCst)
         [ addr, val ]
 
 doCasByteArray


=====================================
compiler/GHC/Tc/Module.hs
=====================================
@@ -1096,6 +1096,7 @@ checkBootTyCon is_boot tc1 tc2
        -- Order of pattern matching matters.
        subDM _ Nothing _ = True
        subDM _ _ Nothing = False
+
        -- If the hsig wrote:
        --
        --   f :: a -> a
@@ -1103,11 +1104,14 @@ checkBootTyCon is_boot tc1 tc2
        --
        -- this should be validly implementable using an old-fashioned
        -- vanilla default method.
-       subDM t1 (Just (_, GenericDM t2)) (Just (_, VanillaDM))
-        = eqTypeX env t1 t2
+       subDM t1 (Just (_, GenericDM gdm_t1)) (Just (_, VanillaDM))
+        = eqType t1 gdm_t1   -- Take care (#22476).  Both t1 and gdm_t1 come
+                             -- from tc1, so use eqType, and /not/ eqTypeX
+
        -- This case can occur when merging signatures
        subDM t1 (Just (_, VanillaDM)) (Just (_, GenericDM t2))
         = eqTypeX env t1 t2
+
        subDM _ (Just (_, VanillaDM)) (Just (_, VanillaDM)) = True
        subDM _ (Just (_, GenericDM t1)) (Just (_, GenericDM t2))
         = eqTypeX env t1 t2


=====================================
compiler/ghc.cabal.in
=====================================
@@ -213,6 +213,7 @@ Library
         GHC.Cmm.Sink
         GHC.Cmm.Switch
         GHC.Cmm.Switch.Implement
+        GHC.Cmm.ThreadSanitizer
         GHC.CmmToAsm
         GHC.Cmm.LRegSet
         GHC.CmmToAsm.AArch64


=====================================
docs/users_guide/debugging.rst
=====================================
@@ -551,6 +551,13 @@ These flags dump various phases of GHC's C-\\- pipeline.
 
     Dump the results of the C-\\- control flow optimisation pass.
 
+.. ghc-flag:: -ddump-cmm-thread-sanitizer
+    :shortdesc: Dump the results of the C-\\- ThreadSanitizer elaboration pass.
+    :type: dynamic
+
+    Dump the results of the C-\\- pass responsible for adding instrumentation
+    added by :ghc-flag:`-fcmm-thread-sanitizer`.
+
 .. ghc-flag:: -ddump-cmm-cbe
     :shortdesc: Dump the results of common block elimination
     :type: dynamic
@@ -1059,6 +1066,15 @@ Checking for consistency
     Note that this is only intended to be used as a debugging measure, not as
     the primary means of catching out-of-bounds accesses.
 
+.. ghc-flag:: -fcmm-thread-sanitizer
+    :shortdesc: Enable ThreadSanitizer instrumentation of memory accesses.
+    :type: dynamic
+
+    This enables generation of `ThreadSanitizer
+    <https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual>`
+    instrumentation of memory accesses. Requires use of ``-fsanitize=thread``
+    or similar when compiling and linking.
+
 .. _checking-determinism:
 
 Checking for determinism


=====================================
hadrian/src/Flavour.hs
=====================================
@@ -197,7 +197,7 @@ splitSections = splitSectionsIf (/=ghc)
 -- there is little benefit.
 
 enableThreadSanitizer :: Flavour -> Flavour
-enableThreadSanitizer = addArgs $ mconcat
+enableThreadSanitizer = addArgs $ notStage0 ? mconcat
     [ builder (Ghc CompileHs) ? arg "-optc-fsanitize=thread"
     , builder (Ghc CompileCWithGhc) ? (arg "-optc-fsanitize=thread" <> arg "-DTSAN_ENABLED")
     , builder (Ghc LinkHs) ? arg "-optl-fsanitize=thread"


=====================================
libraries/base/Data/IORef.hs
=====================================
@@ -46,7 +46,9 @@ mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))
 mkWeakIORef r@(IORef (STRef r#)) (IO finalizer) = IO $ \s ->
     case mkWeak# r# r finalizer s of (# s1, w #) -> (# s1, Weak w #)
 
--- |Mutate the contents of an 'IORef'.
+-- |Mutate the contents of an 'IORef', combining 'readIORef' and 'writeIORef'.
+-- This is not an atomic update, consider using 'atomicModifyIORef' when
+-- operating in a multithreaded environment.
 --
 -- Be warned that 'modifyIORef' does not apply the function strictly.  This
 -- means if the program calls 'modifyIORef' many times, but seldom uses the
@@ -62,7 +64,9 @@ mkWeakIORef r@(IORef (STRef r#)) (IO finalizer) = IO $ \s ->
 modifyIORef :: IORef a -> (a -> a) -> IO ()
 modifyIORef ref f = readIORef ref >>= writeIORef ref . f
 
--- |Strict version of 'modifyIORef'
+-- |Strict version of 'modifyIORef'.
+-- This is not an atomic update, consider using 'atomicModifyIORef'' when
+-- operating in a multithreaded environment.
 --
 -- @since 4.6.0.0
 modifyIORef' :: IORef a -> (a -> a) -> IO ()
@@ -90,13 +94,18 @@ modifyIORef' ref f = do
 --
 -- Use 'atomicModifyIORef'' or 'atomicWriteIORef' to avoid this problem.
 --
+-- This function imposes a memory barrier, preventing reordering;
+-- see "Data.IORef#memmodel" for details.
+--
 atomicModifyIORef :: IORef a -> (a -> (a,b)) -> IO b
 atomicModifyIORef ref f = do
   (_old, ~(_new, res)) <- atomicModifyIORef2 ref f
   pure res
 
--- | Variant of 'writeIORef' with the \"barrier to reordering\" property that
--- 'atomicModifyIORef' has.
+-- | Variant of 'writeIORef'. The prefix "atomic" relates to a fact that
+-- it imposes a reordering barrier, similar to 'atomicModifyIORef'.
+-- Such a write will not be reordered with other reads
+-- or writes even on CPUs with weak memory model.
 --
 -- @since 4.6.0.0
 atomicWriteIORef :: IORef a -> a -> IO ()
@@ -105,11 +114,15 @@ atomicWriteIORef ref a = do
   pure ()
 
 {- $memmodel
+  #memmodel#
+
+  Most modern CPU achitectures (e.g. x86/64, ARM) have a memory model which allows
+  threads to reorder reads with earlier writes to different locations,
+  e.g. see <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html the x86/64 architecture manual>,
+  8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations.
 
-  In a concurrent program, 'IORef' operations may appear out-of-order
-  to another thread, depending on the memory model of the underlying
-  processor architecture.  For example, on x86, loads can move ahead
-  of stores, so in the following example:
+  Because of that, in a concurrent program, 'IORef' operations may appear out-of-order
+  to another thread. In the following example:
 
   > import Data.IORef
   > import Control.Monad (unless)
@@ -131,20 +144,23 @@ atomicWriteIORef ref a = do
 
   it is possible that the string @"critical section"@ is printed
   twice, even though there is no interleaving of the operations of the
-  two threads that allows that outcome.  The memory model of x86
+  two threads that allows that outcome.  The memory model of x86/64
   allows 'readIORef' to happen before the earlier 'writeIORef'.
 
+  The ARM memory order model is typically even weaker than x86/64, allowing
+  any reordering of reads and writes as long as they are independent
+  from the point of view of the current thread.
+
   The implementation is required to ensure that reordering of memory
   operations cannot cause type-correct code to go wrong.  In
   particular, when inspecting the value read from an 'IORef', the
   memory writes that created that value must have occurred from the
   point of view of the current thread.
 
-  'atomicModifyIORef' acts as a barrier to reordering.  Multiple
-  'atomicModifyIORef' operations occur in strict program order.  An
-  'atomicModifyIORef' is never observed to take place ahead of any
+  'atomicWriteIORef', 'atomicModifyIORef' and 'atomicModifyIORef'' act
+  as a barrier to reordering. Multiple calls to these functions
+  occur in strict program order, never taking place ahead of any
   earlier (in program order) 'IORef' operations, or after any later
   'IORef' operations.
 
 -}
-


=====================================
libraries/base/GHC/IORef.hs
=====================================
@@ -32,7 +32,27 @@ import GHC.IO
 -- ---------------------------------------------------------------------------
 -- IORefs
 
--- |A mutable variable in the 'IO' monad
+-- |A mutable variable in the 'IO' monad.
+--
+-- >>> import Data.IORef
+-- >>> r <- newIORef 0
+-- >>> readIORef r
+-- 0
+-- >>> writeIORef r 1
+-- >>> readIORef r
+-- 1
+-- >>> atomicWriteIORef r 2
+-- >>> readIORef r
+-- 2
+-- >>> modifyIORef' r (+ 1)
+-- >>> readIORef r
+-- 3
+-- >>> atomicModifyIORef' r (\a -> (a + 1, ()))
+-- >>> readIORef r
+-- 4
+--
+-- See also 'Data.STRef.STRef' and 'Control.Concurrent.MVar.MVar'.
+--
 newtype IORef a = IORef (STRef RealWorld a)
   deriving Eq
   -- ^ Pointer equality.
@@ -43,11 +63,19 @@ newtype IORef a = IORef (STRef RealWorld a)
 newIORef    :: a -> IO (IORef a)
 newIORef v = stToIO (newSTRef v) >>= \ var -> return (IORef var)
 
--- |Read the value of an 'IORef'
+-- |Read the value of an 'IORef'.
+--
+-- Beware that the CPU executing a thread can reorder reads or writes
+-- to independent locations. See "Data.IORef#memmodel" for more details.
 readIORef   :: IORef a -> IO a
 readIORef  (IORef var) = stToIO (readSTRef var)
 
--- |Write a new value into an 'IORef'
+-- |Write a new value into an 'IORef'.
+--
+-- This function does not create a memory barrier and can be reordered
+-- with other independent reads and writes within a thread, which may cause issues
+-- for multithreaded execution. In these cases, consider using 'Data.IORef.atomicWriteIORef'
+-- instead. See "Data.IORef#memmodel" for more details.
 writeIORef  :: IORef a -> a -> IO ()
 writeIORef (IORef var) v = stToIO (writeSTRef var v)
 
@@ -116,6 +144,9 @@ data Box a = Box a
 -- will increment the 'IORef' and then throw an exception in the calling
 -- thread.
 --
+-- This function imposes a memory barrier, preventing reordering;
+-- see "Data.IORef#memmodel" for details.
+--
 -- @since 4.6.0.0
 atomicModifyIORef' :: IORef a -> (a -> (a,b)) -> IO b
 -- See Note [atomicModifyIORef' definition]


=====================================
libraries/base/System/Posix/Internals.hs
=====================================
@@ -379,6 +379,8 @@ foreign import ccall unsafe "HsBase.h __hscore_open"
 -- it's expensive (NFS, FUSE, etc.), and we especially
 -- need to be able to interrupt a blocking open call.
 -- See #17912.
+--
+-- @since 4.16.0.0
 c_interruptible_open :: CFilePath -> CInt -> CMode -> IO CInt
 c_interruptible_open filepath oflags mode =
   getMaskingState >>= \case
@@ -413,13 +415,21 @@ c_interruptible_open filepath oflags mode =
             interruptible (IO $ \s -> (# yield# s, () #))
       pure open_res
 
+-- |
+--
+-- @since 4.16.0.0
 foreign import ccall interruptible "HsBase.h __hscore_open"
    c_interruptible_open_ :: CFilePath -> CInt -> CMode -> IO CInt
 
 -- | Consult the RTS to find whether it is threaded.
+--
+-- @since 4.16.0.0
 hostIsThreaded :: Bool
 hostIsThreaded = rtsIsThreaded_ /= 0
 
+-- |
+--
+-- @since 4.16.0.0
 foreign import ccall unsafe "rts_isThreaded" rtsIsThreaded_ :: Int
 
 c_safe_open :: CFilePath -> CInt -> CMode -> IO CInt


=====================================
rts/TSANUtils.c
=====================================
@@ -0,0 +1,37 @@
+#include <Rts.h>
+
+#if defined(TSAN_ENABLED)
+
+uint64_t ghc_tsan_atomic64_compare_exchange(uint64_t *ptr, uint64_t expected, uint64_t new_value, int success_memorder, int failure_memorder)
+{
+    __tsan_atomic64_compare_exchange_strong(
+            ptr, &expected, new_value,
+            success_memorder, failure_memorder);
+    return expected;
+}
+
+uint32_t ghc_tsan_atomic32_compare_exchange(uint32_t *ptr, uint32_t expected, uint32_t new_value, int success_memorder, int failure_memorder)
+{
+    __tsan_atomic32_compare_exchange_strong(
+            ptr, &expected, new_value,
+            success_memorder, failure_memorder);
+    return expected;
+}
+
+uint16_t ghc_tsan_atomic16_compare_exchange(uint16_t *ptr, uint16_t expected, uint16_t new_value, int success_memorder, int failure_memorder)
+{
+    __tsan_atomic16_compare_exchange_strong(
+            ptr, &expected, new_value,
+            success_memorder, failure_memorder);
+    return expected;
+}
+
+uint8_t ghc_tsan_atomic8_compare_exchange(uint8_t *ptr, uint8_t expected, uint8_t new_value, int success_memorder, int failure_memorder)
+{
+    __tsan_atomic8_compare_exchange_strong(
+            ptr, &expected, new_value,
+            success_memorder, failure_memorder);
+    return expected;
+}
+
+#endif


=====================================
rts/include/rts/TSANUtils.h
=====================================
@@ -65,3 +65,10 @@ void AnnotateBenignRaceSized(const char *file,
 
 #define TSAN_ANNOTATE_BENIGN_RACE(addr,desc)                            \
     TSAN_ANNOTATE_BENIGN_RACE_SIZED((void*)(addr), sizeof(*addr), desc)
+
+
+uint64_t ghc_tsan_atomic64_compare_exchange(uint64_t *ptr, uint64_t expected, uint64_t new_value, int success_memorder, int failure_memorder);
+uint32_t ghc_tsan_atomic32_compare_exchange(uint32_t *ptr, uint32_t expected, uint32_t new_value, int success_memorder, int failure_memorder);
+uint16_t ghc_tsan_atomic16_compare_exchange(uint16_t *ptr, uint16_t expected, uint16_t new_value, int success_memorder, int failure_memorder);
+uint8_t ghc_tsan_atomic8_compare_exchange(uint8_t *ptr, uint8_t expected, uint8_t new_value, int success_memorder, int failure_memorder);
+


=====================================
rts/rts.cabal.in
=====================================
@@ -542,6 +542,7 @@ library
                Trace.c
                TraverseHeap.c
                TraverseHeapTest.c
+               TSANUtils.c
                WSDeque.c
                Weak.c
                eventlog/EventLog.c



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/69567ec07c1cf0eef430ec247dd4ca3308c73dea...9f545dc8dfb0dbeea19a27a2e4af823dc5831cb4

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/69567ec07c1cf0eef430ec247dd4ca3308c73dea...9f545dc8dfb0dbeea19a27a2e4af823dc5831cb4
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20221121/0ed9026b/attachment-0001.html>


More information about the ghc-commits mailing list