[Git][ghc/ghc][wip/9.2.6-backports] 16 commits: Handle shadowing in DmdAnal (#22718)

Tue Jan 31 12:38:23 UTC 2023

Zubin pushed to branch wip/9.2.6-backports at Glasgow Haskell Compiler / GHC


Commits:
48780a49 by Sebastian Graf at 2023-01-26T18:26:47+05:30
Handle shadowing in DmdAnal (#22718)

Previously, when we had a shadowing situation like
```hs
f x = ... -- demand signature <1L><1L>

main = ... \f -> f 1 ...
```
we'd happily use the shadowed demand signature at the call site inside the
lambda. Of course, that's wrong and solution is simply to remove the demand
signature from the `AnalEnv` when we enter the lambda.
This patch does so for all binding constructs Core.

In #22718 the issue was caused by LetUp not shadowing away the existing demand
signature for the let binder in the let body. The resulting absent error is
fickle to reproduce; hence no reproduction test case. #17478 would help.

Fixes #22718.

It appears that TcPlugin_Rewrite regresses by ~40% on Darwin. It is likely that
DmdAnal was exploiting ill-scoped analysis results.

Metric increase ['bytes allocated'] (test_env=x86_64-darwin-validate):
    TcPlugin_Rewrite

(cherry picked from commit e3fff7512bbf989386faaa1dccafdad1deabde84)

- - - - -
2bfd76eb by Zubin Duggal at 2023-01-31T18:05:07+05:30
Bump bytestring

- - - - -
395e639c by Ben Gamari at 2023-01-31T18:05:13+05:30
compiler: Ensure that MutVar operations have necessary barriers

Here we add acquire and release barriers in readMutVar# and
writeMutVar#, which are necessary for soundness.

Fixes #22468.

(cherry picked from commit a9834736a90aefdd32cfc15be507e22b57eedc07)

- - - - -
267e23ee by Ben Gamari at 2023-01-31T18:05:13+05:30
cmm: Introduce blockConcat

(cherry picked from commit da7b51d8598400ed8073afe1b311c73a04e2230d)
(cherry picked from commit aa422c7127efd5a9da40a4bd53806942f678a5c2)

- - - - -
27216dc1 by Ben Gamari at 2023-01-31T18:05:13+05:30
cmm: Introduce MemoryOrderings

(cherry picked from commit 34f6b09c8e985017c4b18896aeac0c20baf4433d)
(cherry picked from commit 54ad2942efc1373d5be2bdcf82e4fd38d5b3d996)

- - - - -
eddd5a37 by Ben Gamari at 2023-01-31T18:05:13+05:30
llvm: Respect memory specified orderings

(cherry picked from commit 43beaa7baf02d75946c37974fbe46d2857920a53)

- - - - -
117fd472 by Zubin Duggal at 2023-01-31T18:05:13+05:30
Codegen/x86: Eliminate barrier for relaxed accesses

(cherry picked from commit 8faf74fcbd9d9da81ce8a901d694711c5d7ad406)

- - - - -
df505285 by Ben Gamari at 2023-01-31T18:05:13+05:30
cmm/Parser: Reduce some repetition

(cherry picked from commit 6cc3944a06cc5be302bb023a43c0537838b50861)

- - - - -
fcf0b7a9 by Ben Gamari at 2023-01-31T18:05:14+05:30
cmm/Parser: Add syntax for ordered loads and stores

(cherry picked from commit 6c9862c4fee395345dbbcd8ad58ae3f08753219e)

- - - - -
60c4fc12 by Ben Gamari at 2023-01-31T18:05:14+05:30
cmm/Parser: Atomic load syntax

Originally I had thought I would just use the `prim` call syntax instead
of introducing new syntax for atomic loads. However, it turns out that
`prim` call syntax tends to make things quite unreadable. This new
syntax seems quite natural.

(cherry picked from commit 748490d2ff51d6c6fa44aad587908b271c801fa9)

- - - - -
4fa749c2 by Ben Gamari at 2023-01-31T18:05:14+05:30
rts: Ensure that global regs are never passed as fun call args

This is in general unsafe as they may be clobbered if they are mapped to
caller-saved machine registers. See Note [Register parameter passing].

(cherry picked from commit 9372329008143104b0ae5e8e792e957090dfa743)
(cherry picked from commit bb4d238b675e5138d1a6ea46552528136e4142d8)

- - - - -
70c99590 by Andreas Klebinger at 2023-01-31T18:05:14+05:30
Only gc sparks locally when we can ensure marking is done.

When performing GC without work stealing there was no guarantee that
spark pruning was happening after marking of the sparks. This could
cause us to GC live sparks under certain circumstances.

Fixes #22528.

(cherry picked from commit a1491c8791c57a64d94bc08d639d585815c8d4e2)

- - - - -
c1f733f8 by Ian-Woo Kim at 2023-01-31T18:05:14+05:30
Truncate eventlog event for large payload (#20221)

RTS eventlog events for postCapsetVecEvent are truncated if payload
is larger than EVENT_PAYLOAD_SIZE_MAX
Previously, postCapsetVecEvent records eventlog event with payload
of variable size larger than EVENT_PAYLOAD_SIZE_MAX (2^16) without
any validation, resulting in corrupted data.
For example, this happens when a Haskell binary is invoked with very
long command line arguments exceeding 2^16 bytes (see #20221).
Now we check the size of accumulated payload messages incrementally,
and truncate the message just before the payload size exceeds
EVENT_PAYLOAD_SIZE_MAX. RTS will warn the user with a message showing
how many arguments are truncated.

(cherry picked from commit 2057c77d08cb8422857d182a3691f98dccd0c7d6)

- - - - -
fe6da313 by Simon Peyton Jones at 2023-01-31T18:05:14+05:30
Make FloatIn robust to shadowing

This MR fixes #22622. See the new
  Note [Shadowing and name capture]

I did a bit of refactoring in sepBindsByDropPoint too.

The bug doesn't manifest in HEAD, but it did show up in 9.4,
so we should backport this patch to 9.4

(cherry picked from commit 6206cb9287f3f6e70c669660a646a65274870d2b)

- - - - -
2353497f by Ben Gamari at 2023-01-31T18:06:50+05:30
Bump ghc-tarballs to fix #22497

It turns out that gmp 6.2.1 uses the platform-reserved `x18` register on
AArch64/Darwin. This was fixed in upstream changeset 18164:5f32dbc41afc,
which was merged in 2020. Here I backport this patch although I do hope
that a new release is forthcoming soon.

Bumps gmp-tarballs submodule.

Fixes #22497.

(cherry picked from commit f891a442046d8a5ebf4d4777847880ce06752b18)

- - - - -
84254767 by Ben Gamari at 2023-01-31T18:06:54+05:30
Bump gmp-tarballs submodule

This backports the upstream fix for CVE-2021-43618, fixing #22789.

(cherry picked from commit b13c6ea5d4b64841164f8cc58d6c6f3de390f2ed)

- - - - -


23 changed files:

- compiler/GHC/Cmm/ContFlowOpt.hs
- compiler/GHC/Cmm/Dataflow/Block.hs
- compiler/GHC/Cmm/Lexer.x
- compiler/GHC/Cmm/MachOp.hs
- compiler/GHC/Cmm/Parser.y
- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/PPC/CodeGen.hs
- compiler/GHC/CmmToAsm/SPARC/CodeGen.hs
- compiler/GHC/CmmToAsm/X86/CodeGen.hs
- compiler/GHC/CmmToC.hs
- compiler/GHC/CmmToLlvm/CodeGen.hs
- compiler/GHC/Core/Opt/DmdAnal.hs
- compiler/GHC/Core/Opt/FloatIn.hs
- compiler/GHC/StgToCmm/ExtCode.hs
- compiler/GHC/StgToCmm/Prim.hs
- libraries/bytestring
- libraries/ghc-bignum/gmp/gmp-tarballs
- rts/HeapStackCheck.cmm
- rts/Sparks.c
- rts/eventlog/EventLog.c
- rts/sm/GC.c
- + testsuite/tests/simplCore/should_compile/T22662.hs
- testsuite/tests/simplCore/should_compile/all.T


Changes:

=====================================
compiler/GHC/Cmm/ContFlowOpt.hs
=====================================
@@ -12,7 +12,7 @@ where
 
 import GHC.Prelude hiding (succ, unzip, zip)
 
-import GHC.Cmm.Dataflow.Block
+import GHC.Cmm.Dataflow.Block hiding (blockConcat)
 import GHC.Cmm.Dataflow.Collections
 import GHC.Cmm.Dataflow.Graph
 import GHC.Cmm.Dataflow.Label


=====================================
compiler/GHC/Cmm/Dataflow/Block.hs
=====================================
@@ -14,6 +14,7 @@ module GHC.Cmm.Dataflow.Block
     , IndexedCO
     , Block(..)
     , blockAppend
+    , blockConcat
     , blockCons
     , blockFromList
     , blockJoin
@@ -136,6 +137,8 @@ blockJoin f b t = BlockCC f b t
 blockAppend :: Block n e O -> Block n O x -> Block n e x
 blockAppend = cat
 
+blockConcat :: [Block n O O] -> Block n O O
+blockConcat = foldr blockAppend emptyBlock
 
 -- Taking apart
 


=====================================
compiler/GHC/Cmm/Lexer.x
=====================================
@@ -92,6 +92,10 @@ $white_no_nl+           ;
   "!="                  { kw CmmT_Ne }
   "&&"                  { kw CmmT_BoolAnd }
   "||"                  { kw CmmT_BoolOr }
+  "%relaxed"            { kw CmmT_Relaxed }
+  "%acquire"            { kw CmmT_Acquire }
+  "%release"            { kw CmmT_Release }
+  "%seq_cst"            { kw CmmT_SeqCst  }
 
   "True"                { kw CmmT_True  }
   "False"               { kw CmmT_False }
@@ -181,6 +185,10 @@ data CmmToken
   | CmmT_False
   | CmmT_True
   | CmmT_likely
+  | CmmT_Relaxed
+  | CmmT_Acquire
+  | CmmT_Release
+  | CmmT_SeqCst
   deriving (Show)
 
 -- -----------------------------------------------------------------------------


=====================================
compiler/GHC/Cmm/MachOp.hs
=====================================
@@ -24,6 +24,7 @@ module GHC.Cmm.MachOp
     , machOpMemcpyishAlign
 
     -- Atomic read-modify-write
+    , MemoryOrdering(..)
     , AtomicMachOp(..)
    )
 where
@@ -667,16 +668,26 @@ data CallishMachOp
   | MO_BSwap Width
   | MO_BRev Width
 
-  -- Atomic read-modify-write.
+  -- | Atomic read-modify-write. Arguments are @[dest, n]@.
   | MO_AtomicRMW Width AtomicMachOp
-  | MO_AtomicRead Width
-  | MO_AtomicWrite Width
+  -- | Atomic read. Arguments are @[addr]@.
+  | MO_AtomicRead Width MemoryOrdering
+  -- | Atomic write. Arguments are @[addr, value]@.
+  | MO_AtomicWrite Width MemoryOrdering
   | MO_Cmpxchg Width
   -- Should be an AtomicRMW variant eventually.
   -- Sequential consistent.
   | MO_Xchg Width
   deriving (Eq, Show)
 
+-- | C11 memory ordering semantics.
+data MemoryOrdering
+  = MemOrderRelaxed  -- ^ relaxed ordering
+  | MemOrderAcquire  -- ^ acquire ordering
+  | MemOrderRelease  -- ^ release ordering
+  | MemOrderSeqCst   -- ^ sequentially consistent
+  deriving (Eq, Ord, Show)
+
 -- | The operation to perform atomically.
 data AtomicMachOp =
       AMO_Add


=====================================
compiler/GHC/Cmm/Parser.y
=====================================
@@ -194,6 +194,27 @@ convention. Note if a field is longer than a word (e.g. a D_ on
 a 32-bit machine) then the call will push as many words as
 necessary to the stack to accommodate it (e.g. 2).
 
+Memory ordering
+---------------
+
+Cmm respects the C11 memory model and distinguishes between non-atomic and
+atomic memory accesses. In C11 fashion, atomic accesses can provide a number of
+memory ordering guarantees. These are supported in Cmm syntax as follows:
+
+    W_[ptr] = ...;            // a non-atomic store
+    %relaxed W_[ptr] = ...;   // an atomic store with relaxed ordering semantics
+    %release W_[ptr] = ...;   // an atomic store with release ordering semantics
+
+    x = W_(ptr);              // a non-atomic load
+    x = %relaxed W_[ptr];     // an atomic load with relaxed ordering
+    x = %acquire W_[ptr];     // an atomic load with acquire ordering
+    // or equivalently...
+    x = prim %load_acquire64(ptr);
+
+Here we used W_ as an example but these operations can be used on all Cmm
+types.
+
+See Note [Heap memory barriers] in SMP.h for details.
 
 ----------------------------------------------------------------------------- -}
 
@@ -315,6 +336,10 @@ import qualified Data.ByteString.Char8 as BS8
         'True'  { L _ (CmmT_True ) }
         'False' { L _ (CmmT_False) }
         'likely'{ L _ (CmmT_likely)}
+        'relaxed'{ L _ (CmmT_Relaxed)}
+        'acquire'{ L _ (CmmT_Acquire)}
+        'release'{ L _ (CmmT_Release)}
+        'seq_cst'{ L _ (CmmT_SeqCst)}
 
         'CLOSURE'       { L _ (CmmT_CLOSURE) }
         'INFO_TABLE'    { L _ (CmmT_INFO_TABLE) }
@@ -630,8 +655,23 @@ stmt    :: { CmmParse () }
 
         | lreg '=' expr ';'
                 { do reg <- $1; e <- $3; withSourceNote $2 $4 (emitAssign reg e) }
+
+        -- Use lreg instead of local_reg to avoid ambiguity
+        | lreg '=' mem_ordering type '[' expr ']' ';'
+                { do reg <- $1;
+                     let lreg = case reg of
+                                  { CmmLocal r -> r
+                                  ; other -> pprPanic "CmmParse:" (ppr reg <> text "not a local register")
+                                  } ;
+                     mord <- $3;
+                     let { ty = $4; w = typeWidth ty };
+                     e <- $6;
+                     let op = MO_AtomicRead w mord;
+                     withSourceNote $2 $7 $ code (emitPrimCall [lreg] op [e]) }
+        | mem_ordering type '[' expr ']' '=' expr ';'
+                { do mord <- $1; withSourceNote $3 $8 (doStore (Just mord) $2 $4 $7) }
         | type '[' expr ']' '=' expr ';'
-                { withSourceNote $2 $7 (doStore $1 $3 $6) }
+                { withSourceNote $2 $7 (doStore Nothing $1 $3 $6) }
 
         -- Gah! We really want to say "foreign_results" but that causes
         -- a shift/reduce conflict with assignment.  We either
@@ -681,6 +721,14 @@ unwind_regs
         | GLOBALREG '=' expr_or_unknown
                 { do e <- $3; return [($1, e)] }
 
+-- | A memory ordering
+mem_ordering :: { CmmParse MemoryOrdering }
+mem_ordering
+        : 'relaxed' { do return MemOrderRelaxed }
+        | 'release' { do return MemOrderRelease }
+        | 'acquire' { do return MemOrderAcquire }
+        | 'seq_cst' { do return MemOrderSeqCst }
+
 -- | Used by unwind to indicate unknown unwinding values.
 expr_or_unknown
         :: { CmmParse (Maybe CmmExpr) }
@@ -955,6 +1003,7 @@ exprMacros ptr_opts = listToUFM [
     platform = profilePlatform profile
 
 -- we understand a subset of C-- primitives:
+machOps :: UniqFM FastString (Width -> MachOp)
 machOps = listToUFM $
         map (\(x, y) -> (mkFastString x, y)) [
         ( "add",        MO_Add ),
@@ -1073,37 +1122,32 @@ callishMachOps platform = listToUFM $
         ( "memmove", memcpyLikeTweakArgs MO_Memmove ),
         ( "memcmp", memcpyLikeTweakArgs MO_Memcmp ),
 
-        ("prefetch0", (MO_Prefetch_Data 0,)),
-        ("prefetch1", (MO_Prefetch_Data 1,)),
-        ("prefetch2", (MO_Prefetch_Data 2,)),
-        ("prefetch3", (MO_Prefetch_Data 3,)),
-
-        ( "popcnt8",  (MO_PopCnt W8,)),
-        ( "popcnt16", (MO_PopCnt W16,)),
-        ( "popcnt32", (MO_PopCnt W32,)),
-        ( "popcnt64", (MO_PopCnt W64,)),
-
-        ( "pdep8",  (MO_Pdep W8,)),
-        ( "pdep16", (MO_Pdep W16,)),
-        ( "pdep32", (MO_Pdep W32,)),
-        ( "pdep64", (MO_Pdep W64,)),
-
-        ( "pext8",  (MO_Pext W8,)),
-        ( "pext16", (MO_Pext W16,)),
-        ( "pext32", (MO_Pext W32,)),
-        ( "pext64", (MO_Pext W64,)),
-
-        ( "cmpxchg8",  (MO_Cmpxchg W8,)),
-        ( "cmpxchg16", (MO_Cmpxchg W16,)),
-        ( "cmpxchg32", (MO_Cmpxchg W32,)),
-        ( "cmpxchg64", (MO_Cmpxchg W64,)),
-
-        ( "xchg8",  (MO_Xchg W8,)),
-        ( "xchg16", (MO_Xchg W16,)),
-        ( "xchg32", (MO_Xchg W32,)),
-        ( "xchg64", (MO_Xchg W64,))
+        ( "prefetch0", (MO_Prefetch_Data 0,)),
+        ( "prefetch1", (MO_Prefetch_Data 1,)),
+        ( "prefetch2", (MO_Prefetch_Data 2,)),
+        ( "prefetch3", (MO_Prefetch_Data 3,))
+    ] ++ concat
+    [ allWidths "popcnt" MO_PopCnt
+    , allWidths "pdep" MO_Pdep
+    , allWidths "pext" MO_Pext
+    , allWidths "cmpxchg" MO_Cmpxchg
+    , allWidths "xchg" MO_Xchg
+    , allWidths "load_relaxed" (\w -> MO_AtomicRead w MemOrderAcquire)
+    , allWidths "load_acquire" (\w -> MO_AtomicRead w MemOrderAcquire)
+    , allWidths "load_seqcst" (\w -> MO_AtomicRead w MemOrderSeqCst)
+    , allWidths "store_release" (\w -> MO_AtomicWrite w MemOrderRelease)
+    , allWidths "store_seqcst" (\w -> MO_AtomicWrite w MemOrderSeqCst)
     ]
   where
+    allWidths
+        :: String
+        -> (Width -> CallishMachOp)
+        -> [(FastString, a -> (CallishMachOp, a))]
+    allWidths name f =
+        [ (mkFastString $ name ++ show (widthInBits w), (f w,))
+        | w <- [W8, W16, W32, W64]
+        ]
+
     memcpyLikeTweakArgs :: (Int -> CallishMachOp) -> [CmmExpr] -> (CallishMachOp, [CmmExpr])
     memcpyLikeTweakArgs op [] = pgmError "memcpy-like function requires at least one argument"
     memcpyLikeTweakArgs op args@(_:_) =
@@ -1344,8 +1388,12 @@ primCall results_code name args_code
                 let (p, args') = f args
                 code (emitPrimCall (map fst results) p args')
 
-doStore :: CmmType -> CmmParse CmmExpr  -> CmmParse CmmExpr -> CmmParse ()
-doStore rep addr_code val_code
+doStore :: Maybe MemoryOrdering
+        -> CmmType
+        -> CmmParse CmmExpr   -- ^ address
+        -> CmmParse CmmExpr   -- ^ value
+        -> CmmParse ()
+doStore mem_ord rep addr_code val_code
   = do platform <- getPlatform
        addr <- addr_code
        val <- val_code
@@ -1359,7 +1407,7 @@ doStore rep addr_code val_code
        let coerce_val
                 | val_width /= rep_width = CmmMachOp (MO_UU_Conv val_width rep_width) [val]
                 | otherwise              = val
-       emitStore addr coerce_val
+       emitStore mem_ord addr coerce_val
 
 -- -----------------------------------------------------------------------------
 -- If-then-else and boolean expressions


=====================================
compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
=====================================
@@ -1534,8 +1534,8 @@ genCCall target dest_regs arg_regs bid = do
 
         -- -- Atomic read-modify-write.
         MO_AtomicRMW w amop -> mkCCall (atomicRMWLabel w amop)
-        MO_AtomicRead w     -> mkCCall (atomicReadLabel w)
-        MO_AtomicWrite w    -> mkCCall (atomicWriteLabel w)
+        MO_AtomicRead w _   -> mkCCall (atomicReadLabel w)
+        MO_AtomicWrite w _  -> mkCCall (atomicWriteLabel w)
         MO_Cmpxchg w        -> mkCCall (cmpxchgLabel w)
         -- -- Should be an AtomicRMW variant eventually.
         -- -- Sequential consistent.


=====================================
compiler/GHC/CmmToAsm/PPC/CodeGen.hs
=====================================
@@ -1195,7 +1195,7 @@ genCCall (PrimTarget (MO_AtomicRMW width amop)) [dst] [addr, n]
                           (n_reg, n_code) <- getSomeReg n
                           return  (op dst dst (RIReg n_reg), n_code)
 
-genCCall (PrimTarget (MO_AtomicRead width)) [dst] [addr]
+genCCall (PrimTarget (MO_AtomicRead width _)) [dst] [addr]
  = do platform <- getPlatform
       let fmt      = intFormat width
           reg_dst  = getRegisterReg platform (CmmLocal dst)
@@ -1222,7 +1222,7 @@ genCCall (PrimTarget (MO_AtomicRead width)) [dst] [addr]
 -- This is also what gcc does.
 
 
-genCCall (PrimTarget (MO_AtomicWrite width)) [] [addr, val] = do
+genCCall (PrimTarget (MO_AtomicWrite width _)) [] [addr, val] = do
     code <- assignMem_IntCode (intFormat width) addr val
     return $ unitOL(HWSYNC) `appOL` code
 
@@ -2059,8 +2059,8 @@ genCCall' config gcp target dest_regs args
                     MO_AtomicRMW {} -> unsupported
                     MO_Cmpxchg w -> (fsLit $ cmpxchgLabel w, False)
                     MO_Xchg w    -> (fsLit $ xchgLabel w, False)
-                    MO_AtomicRead _  -> unsupported
-                    MO_AtomicWrite _ -> unsupported
+                    MO_AtomicRead _  _ -> unsupported
+                    MO_AtomicWrite _ _ -> unsupported
 
                     MO_S_Mul2    {}  -> unsupported
                     MO_S_QuotRem {}  -> unsupported


=====================================
compiler/GHC/CmmToAsm/SPARC/CodeGen.hs
=====================================
@@ -707,8 +707,9 @@ outOfLineMachOp_table mop
         MO_AtomicRMW w amop -> fsLit $ atomicRMWLabel w amop
         MO_Cmpxchg w -> fsLit $ cmpxchgLabel w
         MO_Xchg w -> fsLit $ xchgLabel w
-        MO_AtomicRead w -> fsLit $ atomicReadLabel w
-        MO_AtomicWrite w -> fsLit $ atomicWriteLabel w
+        -- TODO: handle orderings
+        MO_AtomicRead w _ -> fsLit $ atomicReadLabel w
+        MO_AtomicWrite w _ -> fsLit $ atomicWriteLabel w
 
         MO_S_Mul2    {}  -> unsupported
         MO_S_QuotRem {}  -> unsupported


=====================================
compiler/GHC/CmmToAsm/X86/CodeGen.hs
=====================================
@@ -2597,15 +2597,20 @@ genCCall' config is32Bit (PrimTarget (MO_UF_Conv width)) dest_regs args bid = do
   where
     lbl = mkCmmCodeLabel primUnitId (fsLit (word2FloatLabel width))
 
-genCCall' _ _ (PrimTarget (MO_AtomicRead width)) [dst] [addr] _ = do
+genCCall' _ _ (PrimTarget (MO_AtomicRead width _mord)) [dst] [addr] _ = do
   load_code <- intLoadCode (MOV (intFormat width)) addr
   platform <- ncgPlatform <$> getConfig
 
   return (load_code (getRegisterReg platform  (CmmLocal dst)))
 
-genCCall' _ _ (PrimTarget (MO_AtomicWrite width)) [] [addr, val] _ = do
+genCCall' _ _ (PrimTarget (MO_AtomicWrite width mord)) [] [addr, val] _ = do
     code <- assignMem_IntCode (intFormat width) addr val
-    return $ code `snocOL` MFENCE
+    let needs_fence = case mord of
+          MemOrderSeqCst  -> True
+          MemOrderRelease -> True
+          MemOrderAcquire -> pprPanic "genAtomicWrite: acquire ordering on write" empty
+          MemOrderRelaxed -> False
+    return $ if needs_fence then code `snocOL` MFENCE else code
 
 genCCall' _ is32Bit (PrimTarget (MO_Cmpxchg width)) [dst] [addr, old, new] _ = do
     -- On x86 we don't have enough registers to use cmpxchg with a
@@ -3448,8 +3453,9 @@ outOfLineCmmOp bid mop res args
               MO_Pext w    -> fsLit $ pextLabel w
 
               MO_AtomicRMW _ _ -> fsLit "atomicrmw"
-              MO_AtomicRead _  -> fsLit "atomicread"
-              MO_AtomicWrite _ -> fsLit "atomicwrite"
+              -- TODO: handle orderings
+              MO_AtomicRead _ _ -> fsLit "atomicread"
+              MO_AtomicWrite _ _ -> fsLit "atomicwrite"
               MO_Cmpxchg _     -> fsLit "cmpxchg"
               MO_Xchg _        -> should_be_inline
 


=====================================
compiler/GHC/CmmToC.hs
=====================================
@@ -940,8 +940,9 @@ pprCallishMachOp_for_C mop
         (MO_AtomicRMW w amop) -> ptext (sLit $ atomicRMWLabel w amop)
         (MO_Cmpxchg w)  -> ptext (sLit $ cmpxchgLabel w)
         (MO_Xchg w)     -> ptext (sLit $ xchgLabel w)
-        (MO_AtomicRead w)  -> ptext (sLit $ atomicReadLabel w)
-        (MO_AtomicWrite w) -> ptext (sLit $ atomicWriteLabel w)
+        -- TODO: handle orderings
+        (MO_AtomicRead w _)  -> ptext (sLit $ atomicReadLabel w)
+        (MO_AtomicWrite w _) -> ptext (sLit $ atomicWriteLabel w)
         (MO_UF_Conv w)  -> ptext (sLit $ word2FloatLabel w)
 
         MO_S_Mul2     {} -> unsupported


=====================================
compiler/GHC/CmmToLlvm/CodeGen.hs
=====================================
@@ -47,7 +47,7 @@ import qualified Data.Semigroup as Semigroup
 import Data.List ( nub )
 import Data.Maybe ( catMaybes )
 
-type Atomic = Bool
+type Atomic = Maybe MemoryOrdering
 type LlvmStatements = OrdList LlvmStatement
 
 data Signage = Signed | Unsigned deriving (Eq, Show)
@@ -267,9 +267,9 @@ genCall (PrimTarget (MO_AtomicRMW width amop)) [dst] [addr, n] = runStmtsDecls $
     retVar <- doExprW targetTy $ AtomicRMW op ptrVar nVar SyncSeqCst
     statement $ Store retVar dstVar Nothing
 
-genCall (PrimTarget (MO_AtomicRead _)) [dst] [addr] = runStmtsDecls $ do
+genCall (PrimTarget (MO_AtomicRead _ mem_ord)) [dst] [addr] = runStmtsDecls $ do
     dstV <- getCmmRegW (CmmLocal dst)
-    v1 <- genLoadW True addr (localRegType dst) NaturallyAligned
+    v1 <- genLoadW (Just mem_ord) addr (localRegType dst) NaturallyAligned
     statement $ Store v1 dstV Nothing
 
 genCall (PrimTarget (MO_Cmpxchg _width))
@@ -296,13 +296,14 @@ genCall (PrimTarget (MO_Xchg _width)) [dst] [addr, val] = runStmtsDecls $ do
     resVar <- doExprW (getVarType valVar) (AtomicRMW LAO_Xchg ptrVar valVar SyncSeqCst)
     statement $ Store resVar dstV Nothing
 
-genCall (PrimTarget (MO_AtomicWrite _width)) [] [addr, val] = runStmtsDecls $ do
+genCall (PrimTarget (MO_AtomicWrite _width mem_ord)) [] [addr, val] = runStmtsDecls $ do
     addrVar <- exprToVarW addr
     valVar <- exprToVarW val
     let ptrTy = pLift $ getVarType valVar
         ptrExpr = Cast LM_Inttoptr addrVar ptrTy
     ptrVar <- doExprW ptrTy ptrExpr
-    statement $ Expr $ AtomicRMW LAO_Xchg ptrVar valVar SyncSeqCst
+    let ordering = convertMemoryOrdering mem_ord
+    statement $ Expr $ AtomicRMW LAO_Xchg ptrVar valVar ordering
 
 -- Handle memcpy function specifically since llvm's intrinsic version takes
 -- some extra parameters.
@@ -904,11 +905,11 @@ cmmPrimOpFunctions mop = do
     MO_Touch         -> unsupported
     MO_UF_Conv _     -> unsupported
 
-    MO_AtomicRead _  -> unsupported
-    MO_AtomicRMW _ _ -> unsupported
-    MO_AtomicWrite _ -> unsupported
-    MO_Cmpxchg _     -> unsupported
-    MO_Xchg _        -> unsupported
+    MO_AtomicRead _ _  -> unsupported
+    MO_AtomicRMW _ _   -> unsupported
+    MO_AtomicWrite _ _ -> unsupported
+    MO_Cmpxchg _       -> unsupported
+    MO_Xchg _          -> unsupported
 
     MO_I64_ToI       -> dontReach64
     MO_I64_FromI     -> dontReach64
@@ -1263,7 +1264,7 @@ exprToVarOpt opt e = case e of
         -> genLit opt lit
 
     CmmLoad e' ty align
-        -> genLoad False e' ty align
+        -> genLoad Nothing e' ty align
 
     -- Cmmreg in expression is the value, so must load. If you want actual
     -- reg pointer, call getCmmReg directly.
@@ -1798,7 +1799,8 @@ case we will need a more granular way of specifying alignment.
 
 mkLoad :: Atomic -> LlvmVar -> AlignmentSpec -> LlvmExpression
 mkLoad atomic vptr alignment
-  | atomic      = ALoad SyncSeqCst False vptr
+  | Just mem_ord <- atomic
+                = ALoad (convertMemoryOrdering mem_ord) False vptr
   | otherwise   = Load vptr align
   where
     ty = pLower (getVarType vptr)
@@ -1935,6 +1937,12 @@ genLit _ CmmHighStackMark
 -- * Misc
 --
 
+convertMemoryOrdering :: MemoryOrdering -> LlvmSyncOrdering
+convertMemoryOrdering MemOrderRelaxed = SyncUnord
+convertMemoryOrdering MemOrderAcquire = SyncAcquire
+convertMemoryOrdering MemOrderRelease = SyncRelease
+convertMemoryOrdering MemOrderSeqCst  = SyncSeqCst
+
 -- | Find CmmRegs that get assigned and allocate them on the stack
 --
 -- Any register that gets written needs to be allocated on the


=====================================
compiler/GHC/Core/Opt/DmdAnal.hs
=====================================
@@ -275,7 +275,7 @@ dmdAnalBindLetUp :: TopLevelFlag
                  -> WithDmdType (DmdResult CoreBind a)
 dmdAnalBindLetUp top_lvl env id rhs anal_body = WithDmdType final_ty (R (NonRec id' rhs') (body'))
   where
-    WithDmdType body_ty body'   = anal_body env
+    WithDmdType body_ty body'   = anal_body (addInScopeAnalEnv env id)
     WithDmdType body_ty' id_dmd = findBndrDmd env notArgOfDfun body_ty id
     !id'                = setBindIdDemandInfo top_lvl id id_dmd
     (rhs_ty, rhs')     = dmdAnalStar env (dmdTransformThunkDmd rhs id_dmd) rhs
@@ -405,7 +405,8 @@ dmdAnal' env dmd (App fun arg)
 dmdAnal' env dmd (Lam var body)
   | isTyVar var
   = let
-        WithDmdType body_ty body' = dmdAnal env dmd body
+        WithDmdType body_ty body' = dmdAnal (addInScopeAnalEnv env var) dmd body
+        -- See Note [Bringing a new variable into scope]
     in
     WithDmdType body_ty (Lam var body')
 
@@ -413,7 +414,8 @@ dmdAnal' env dmd (Lam var body)
   = let (n, body_dmd)    = peelCallDmd dmd
           -- body_dmd: a demand to analyze the body
 
-        WithDmdType body_ty body' = dmdAnal env body_dmd body
+        WithDmdType body_ty body' = dmdAnal (addInScopeAnalEnv env var) body_dmd body
+        -- See Note [Bringing a new variable into scope]
         WithDmdType lam_ty var'   = annotateLamIdBndr env notArgOfDfun body_ty var
         new_dmd_type = multDmdType n lam_ty
     in
@@ -424,7 +426,9 @@ dmdAnal' env dmd (Case scrut case_bndr ty [Alt alt bndrs rhs])
   -- If it's a DataAlt, it should be the only constructor of the type.
   | is_single_data_alt alt
   = let
-        WithDmdType rhs_ty rhs'           = dmdAnal env dmd rhs
+        rhs_env = addInScopeAnalEnvs env (case_bndr:bndrs)
+        -- See Note [Bringing a new variable into scope]
+        WithDmdType rhs_ty rhs'           = dmdAnal rhs_env dmd rhs
         WithDmdType alt_ty1 dmds          = findBndrsDmds env rhs_ty bndrs
         WithDmdType alt_ty2 case_bndr_dmd = findBndrDmd env False alt_ty1 case_bndr
         -- Evaluation cardinality on the case binder is irrelevant and a no-op.
@@ -547,7 +551,9 @@ forcesRealWorld fam_envs ty
 
 dmdAnalSumAlt :: AnalEnv -> SubDemand -> Id -> Alt Var -> WithDmdType (Alt Var)
 dmdAnalSumAlt env dmd case_bndr (Alt con bndrs rhs)
-  | WithDmdType rhs_ty rhs' <- dmdAnal env dmd rhs
+  | let rhs_env = addInScopeAnalEnvs env (case_bndr:bndrs)
+    -- See Note [Bringing a new variable into scope]
+  , WithDmdType rhs_ty rhs' <- dmdAnal rhs_env dmd rhs
   , WithDmdType alt_ty dmds <- findBndrsDmds env rhs_ty bndrs
   , let (_ :* case_bndr_sd) = findIdDemand alt_ty case_bndr
         -- See Note [Demand on scrutinee of a product case]
@@ -1437,7 +1443,7 @@ emptyAnalEnv opts fam_envs
 emptySigEnv :: SigEnv
 emptySigEnv = emptyVarEnv

--- | Extend an environment with the strictness IDs attached to the id
+-- | Extend an environment with the strictness sigs attached to the Ids
 extendAnalEnvs :: TopLevelFlag -> AnalEnv -> [Id] -> AnalEnv
 extendAnalEnvs top_lvl env vars
   = env { ae_sigs = extendSigEnvs top_lvl (ae_sigs env) vars }
@@ -1456,6 +1462,12 @@ extendSigEnv top_lvl sigs var sig = extendVarEnv sigs var (sig, top_lvl)
 lookupSigEnv :: AnalEnv -> Id -> Maybe (StrictSig, TopLevelFlag)
 lookupSigEnv env id = lookupVarEnv (ae_sigs env) id
 
+addInScopeAnalEnv :: AnalEnv -> Var -> AnalEnv
+addInScopeAnalEnv env id = env { ae_sigs = delVarEnv (ae_sigs env) id }
+
+addInScopeAnalEnvs :: AnalEnv -> [Var] -> AnalEnv
+addInScopeAnalEnvs env ids = env { ae_sigs = delVarEnvList (ae_sigs env) ids }
+
 nonVirgin :: AnalEnv -> AnalEnv
 nonVirgin env = env { ae_virgin = False }
 
@@ -1496,8 +1508,20 @@ findBndrDmd env arg_of_dfun dmd_ty id
 
     fam_envs = ae_fam_envs env
 
-{- Note [Initialising strictness]
+{- Note [Bringing a new variable into scope]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider
+   f x = blah
+   g = ...(\f. ...f...)...
+
+In the body of the '\f', any occurrence of `f` refers to the lambda-bound `f`,
+not the top-level `f` (which will be in `ae_sigs`).  So it's very important
+to delete `f` from `ae_sigs` when we pass a lambda/case/let-up binding of `f`.
+Otherwise chaos results (#22718).
+
+Note [Initialising strictness]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 See section 9.2 (Finding fixpoints) of the paper.
 
 Our basic plan is to initialise the strictness of each Id in a


=====================================
compiler/GHC/Core/Opt/FloatIn.hs
=====================================
@@ -43,6 +43,10 @@ import GHC.Unit.Module.ModGuts
 import GHC.Utils.Misc
 import GHC.Utils.Panic
 
+import GHC.Utils.Outputable
+
+import Data.List        ( mapAccumL )
+
 {-
 Top-level interface function, @floatInwards at .  Note that we do not
 actually float any bindings downwards from the top-level.
@@ -132,7 +136,7 @@ the closure for a is not built.
 ************************************************************************
 -}
 
-type FreeVarSet  = DIdSet
+type FreeVarSet  = DVarSet
 type BoundVarSet = DIdSet
 
 data FloatInBind = FB BoundVarSet FreeVarSet FloatBind
@@ -140,11 +144,17 @@ data FloatInBind = FB BoundVarSet FreeVarSet FloatBind
         -- of recursive bindings, the set doesn't include the bound
         -- variables.
 
-type FloatInBinds = [FloatInBind]
-        -- In reverse dependency order (innermost binder first)
+type FloatInBinds    = [FloatInBind] -- In normal dependency order
+                                     --    (outermost binder first)
+type RevFloatInBinds = [FloatInBind] -- In reverse dependency order
+                                     --    (innermost binder first)
+
+instance Outputable FloatInBind where
+  ppr (FB bvs fvs _) = text "FB" <> braces (sep [ text "bndrs =" <+> ppr bvs
+                                                , text "fvs =" <+> ppr fvs ])
 
 fiExpr :: Platform
-       -> FloatInBinds      -- Binds we're trying to drop
+       -> RevFloatInBinds   -- Binds we're trying to drop
                             -- as far "inwards" as possible
        -> CoreExprWithFVs   -- Input expr
        -> CoreExpr          -- Result
@@ -155,13 +165,12 @@ fiExpr _ to_drop (_, AnnType ty)     = ASSERT( null to_drop ) Type ty
 fiExpr _ to_drop (_, AnnVar v)       = wrapFloats to_drop (Var v)
 fiExpr _ to_drop (_, AnnCoercion co) = wrapFloats to_drop (Coercion co)
 fiExpr platform to_drop (_, AnnCast expr (co_ann, co))
-  = wrapFloats (drop_here ++ co_drop) $
+  = wrapFloats drop_here $
     Cast (fiExpr platform e_drop expr) co
   where
-    [drop_here, e_drop, co_drop]
-      = sepBindsByDropPoint platform False
-          [freeVarsOf expr, freeVarsOfAnn co_ann]
-          to_drop
+    (drop_here, [e_drop])
+      = sepBindsByDropPoint platform False to_drop
+          (freeVarsOfAnn co_ann) [freeVarsOf expr]
 
 {-
 Applications: we do float inside applications, mainly because we
@@ -170,7 +179,7 @@ pull out any silly ones.
 -}
 
 fiExpr platform to_drop ann_expr@(_,AnnApp {})
-  = wrapFloats drop_here $ wrapFloats extra_drop $
+  = wrapFloats drop_here $
     mkTicks ticks $
     mkApps (fiExpr platform fun_drop ann_fun)
            (zipWithEqual "fiExpr" (fiExpr platform) arg_drops ann_args)
@@ -180,19 +189,18 @@ fiExpr platform to_drop ann_expr@(_,AnnApp {})
     (ann_fun, ann_args, ticks) = collectAnnArgsTicks tickishFloatable ann_expr
     fun_ty  = exprType (deAnnotate ann_fun)
     fun_fvs = freeVarsOf ann_fun
-    arg_fvs = map freeVarsOf ann_args
 
-    (drop_here : extra_drop : fun_drop : arg_drops)
-       = sepBindsByDropPoint platform False
-                             (extra_fvs : fun_fvs : arg_fvs)
-                             to_drop
+    (drop_here, fun_drop : arg_drops)
+       = sepBindsByDropPoint platform False to_drop
+                             here_fvs (fun_fvs : arg_fvs)
+
          -- Shortcut behaviour: if to_drop is empty,
          -- sepBindsByDropPoint returns a suitable bunch of empty
          -- lists without evaluating extra_fvs, and hence without
          -- peering into each argument
 
-    (_, extra_fvs) = foldl' add_arg (fun_ty, extra_fvs0) ann_args
-    extra_fvs0 = case ann_fun of
+    ((_,here_fvs), arg_fvs) = mapAccumL add_arg (fun_ty,here_fvs0) ann_args
+    here_fvs0 = case ann_fun of
                    (_, AnnVar _) -> fun_fvs
                    _             -> emptyDVarSet
           -- Don't float the binding for f into f x y z; see Note [Join points]
@@ -200,15 +208,13 @@ fiExpr platform to_drop ann_expr@(_,AnnApp {})
           -- join point, floating it in isn't especially harmful but it's
           -- useless since the simplifier will immediately float it back out.)
 
-    add_arg :: (Type,FreeVarSet) -> CoreExprWithFVs -> (Type,FreeVarSet)
-    add_arg (fun_ty, extra_fvs) (_, AnnType ty)
-      = (piResultTy fun_ty ty, extra_fvs)
-
-    add_arg (fun_ty, extra_fvs) (arg_fvs, arg)
-      | noFloatIntoArg arg arg_ty
-      = (res_ty, extra_fvs `unionDVarSet` arg_fvs)
-      | otherwise
-      = (res_ty, extra_fvs)
+    add_arg :: (Type,FreeVarSet) -> CoreExprWithFVs -> ((Type,FreeVarSet),FreeVarSet)
+    add_arg (fun_ty, here_fvs) (arg_fvs, AnnType ty)
+      = ((piResultTy fun_ty ty, here_fvs), arg_fvs)
+    -- We can't float into some arguments, so put them into the here_fvs
+    add_arg (fun_ty, here_fvs) (arg_fvs, arg)
+      | noFloatIntoArg arg arg_ty = ((res_ty,here_fvs `unionDVarSet` arg_fvs), emptyDVarSet)
+      | otherwise          = ((res_ty,here_fvs), arg_fvs)
       where
        (_, arg_ty, res_ty) = splitFunTy fun_ty
 
@@ -292,7 +298,6 @@ it's non-recursive, so we float only into non-recursive join points.)
 Urk! if all are tyvars, and we don't float in, we may miss an
       opportunity to float inside a nested case branch
 
-
 Note [Floating coercions]
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 We could, in principle, have a coercion binding like
@@ -312,6 +317,36 @@ of the types of all the drop points involved. If any of the floaters
 bind a coercion variable mentioned in any of the types, that binder must
 be dropped right away.
 
+Note [Shadowing and name capture]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Suppose we have
+    let x = y+1 in
+    case p of
+       (y:ys) -> ...x...
+       [] -> blah
+It is obviously bogus for FloatIn to transform to
+    case p of
+       (y:ys) -> ...(let x = y+1 in x)...
+       [] -> blah
+because the y is captured.  This doesn't happen much, because shadowing is
+rare, but it did happen in #22662.
+
+One solution would be to clone as we go.  But a simpler one is this:
+
+  at a binding site (like that for (y:ys) above), abandon float-in for
+  any floating bindings that mention the binders (y, ys in this case)
+
+We achieve that by calling sepBindsByDropPoint with the binders in
+the "used-here" set:
+
+* In fiExpr (AnnLam ...).  For the body there is no need to delete
+  the lambda-binders from the body_fvs, because any bindings that
+  mention these binders will be dropped here anyway.
+
+* In fiExpr (AnnCase ...). Remember to include the case_bndr in the
+  binders.  Again, no need to delete the alt binders from the rhs
+  free vars, beause any bindings mentioning them will be dropped
+  here unconditionally.
 -}
 
 fiExpr platform to_drop lam@(_, AnnLam _ _)
@@ -320,10 +355,17 @@ fiExpr platform to_drop lam@(_, AnnLam _ _)
   = wrapFloats to_drop (mkLams bndrs (fiExpr platform [] body))
 
   | otherwise           -- Float inside
-  = mkLams bndrs (fiExpr platform to_drop body)
+  = wrapFloats drop_here $
+    mkLams bndrs (fiExpr platform body_drop body)
 
   where
     (bndrs, body) = collectAnnBndrs lam
+    body_fvs      = freeVarsOf body
+
+    -- Why sepBindsByDropPoint? Because of potential capture
+    -- See Note [Shadowing and name capture]
+    (drop_here, [body_drop]) = sepBindsByDropPoint platform False to_drop
+                                  (mkDVarSet bndrs) [body_fvs]
 
 {-
 We don't float lets inwards past an SCC.
@@ -462,16 +504,16 @@ fiExpr platform to_drop (_, AnnCase scrut case_bndr _ [AnnAlt con alt_bndrs rhs]
   = wrapFloats shared_binds $
     fiExpr platform (case_float : rhs_binds) rhs
   where
-    case_float = FB (mkDVarSet (case_bndr : alt_bndrs)) scrut_fvs
+    case_float = FB all_bndrs scrut_fvs
                     (FloatCase scrut' case_bndr con alt_bndrs)
     scrut'     = fiExpr platform scrut_binds scrut
-    rhs_fvs    = freeVarsOf rhs `delDVarSetList` (case_bndr : alt_bndrs)
-    scrut_fvs  = freeVarsOf scrut
+    rhs_fvs    = freeVarsOf rhs    -- No need to delete alt_bndrs
+    scrut_fvs  = freeVarsOf scrut  -- See Note [Shadowing and name capture]
+    all_bndrs  = mkDVarSet alt_bndrs `extendDVarSet` case_bndr
 
-    [shared_binds, scrut_binds, rhs_binds]
-       = sepBindsByDropPoint platform False
-           [scrut_fvs, rhs_fvs]
-           to_drop
+    (shared_binds, [scrut_binds, rhs_binds])
+       = sepBindsByDropPoint platform False to_drop
+                     all_bndrs [scrut_fvs, rhs_fvs]
 
 fiExpr platform to_drop (_, AnnCase scrut case_bndr ty alts)
   = wrapFloats drop_here1 $
@@ -481,38 +523,42 @@ fiExpr platform to_drop (_, AnnCase scrut case_bndr ty alts)
          -- use zipWithEqual, we should have length alts_drops_s = length alts
   where
         -- Float into the scrut and alts-considered-together just like App
-    [drop_here1, scrut_drops, alts_drops]
-       = sepBindsByDropPoint platform False
-           [scrut_fvs, all_alts_fvs]
-           to_drop
+    (drop_here1, [scrut_drops, alts_drops])
+       = sepBindsByDropPoint platform False to_drop
+             all_alt_bndrs [scrut_fvs, all_alt_fvs]
+             -- all_alt_bndrs: see Note [Shadowing and name capture]
 
         -- Float into the alts with the is_case flag set
-    (drop_here2 : alts_drops_s)
-      | [ _ ] <- alts = [] : [alts_drops]
-      | otherwise     = sepBindsByDropPoint platform True alts_fvs alts_drops
-
-    scrut_fvs    = freeVarsOf scrut
-    alts_fvs     = map alt_fvs alts
-    all_alts_fvs = unionDVarSets alts_fvs
-    alt_fvs (AnnAlt _con args rhs)
-      = foldl' delDVarSet (freeVarsOf rhs) (case_bndr:args)
-           -- Delete case_bndr and args from free vars of rhs
-           -- to get free vars of alt
+    (drop_here2, alts_drops_s)
+       = sepBindsByDropPoint platform True alts_drops emptyDVarSet alts_fvs
+
+    scrut_fvs = freeVarsOf scrut
+
+    all_alt_bndrs = foldr (unionDVarSet . ann_alt_bndrs) (unitDVarSet case_bndr) alts
+    ann_alt_bndrs (AnnAlt _ bndrs _) = mkDVarSet bndrs
+
+    alts_fvs :: [DVarSet]
+    alts_fvs = [freeVarsOf rhs | AnnAlt _ _ rhs <- alts]
+               -- No need to delete binders
+               -- See Note [Shadowing and name capture]
+
+    all_alt_fvs :: DVarSet
+    all_alt_fvs = foldr unionDVarSet (unitDVarSet case_bndr) alts_fvs
 
     fi_alt to_drop (AnnAlt con args rhs) = Alt con args (fiExpr platform to_drop rhs)
 
 ------------------
 fiBind :: Platform
-       -> FloatInBinds      -- Binds we're trying to drop
-                            -- as far "inwards" as possible
-       -> CoreBindWithFVs   -- Input binding
-       -> DVarSet           -- Free in scope of binding
-       -> ( FloatInBinds    -- Land these before
-          , FloatInBind     -- The binding itself
-          , FloatInBinds)   -- Land these after
+       -> RevFloatInBinds    -- Binds we're trying to drop
+                             -- as far "inwards" as possible
+       -> CoreBindWithFVs    -- Input binding
+       -> DVarSet            -- Free in scope of binding
+       -> ( RevFloatInBinds  -- Land these before
+          , FloatInBind      -- The binding itself
+          , RevFloatInBinds) -- Land these after
 
 fiBind platform to_drop (AnnNonRec id ann_rhs@(rhs_fvs, rhs)) body_fvs
-  = ( extra_binds ++ shared_binds          -- Land these before
+  = ( shared_binds          -- Land these before
                                            -- See Note [extra_fvs (1,2)]
     , FB (unitDVarSet id) rhs_fvs'         -- The new binding itself
           (FloatLet (NonRec id rhs'))
@@ -531,10 +577,9 @@ fiBind platform to_drop (AnnNonRec id ann_rhs@(rhs_fvs, rhs)) body_fvs
         -- We *can't* float into ok-for-speculation unlifted RHSs
         -- But do float into join points
 
-    [shared_binds, extra_binds, rhs_binds, body_binds]
-        = sepBindsByDropPoint platform False
-            [extra_fvs, rhs_fvs, body_fvs2]
-            to_drop
+    (shared_binds, [rhs_binds, body_binds])
+        = sepBindsByDropPoint platform False to_drop
+                      extra_fvs [rhs_fvs, body_fvs2]
 
         -- Push rhs_binds into the right hand side of the binding
     rhs'     = fiRhs platform rhs_binds id ann_rhs
@@ -542,7 +587,7 @@ fiBind platform to_drop (AnnNonRec id ann_rhs@(rhs_fvs, rhs)) body_fvs
                         -- Don't forget the rule_fvs; the binding mentions them!
 
 fiBind platform to_drop (AnnRec bindings) body_fvs
-  = ( extra_binds ++ shared_binds
+  = ( shared_binds
     , FB (mkDVarSet ids) rhs_fvs'
          (FloatLet (Rec (fi_bind rhss_binds bindings)))
     , body_binds )
@@ -556,17 +601,16 @@ fiBind platform to_drop (AnnRec bindings) body_fvs
                 unionDVarSets [ rhs_fvs | (bndr, (rhs_fvs, rhs)) <- bindings
                               , noFloatIntoRhs Recursive bndr rhs ]
 
-    (shared_binds:extra_binds:body_binds:rhss_binds)
-        = sepBindsByDropPoint platform False
-            (extra_fvs:body_fvs:rhss_fvs)
-            to_drop
+    (shared_binds, body_binds:rhss_binds)
+        = sepBindsByDropPoint platform False to_drop
+                       extra_fvs (body_fvs:rhss_fvs)
 
     rhs_fvs' = unionDVarSets rhss_fvs `unionDVarSet`
                unionDVarSets (map floatedBindsFVs rhss_binds) `unionDVarSet`
                rule_fvs         -- Don't forget the rule variables!
 
     -- Push rhs_binds into the right hand side of the binding
-    fi_bind :: [FloatInBinds]       -- one per "drop pt" conjured w/ fvs_of_rhss
+    fi_bind :: [RevFloatInBinds]   -- One per "drop pt" conjured w/ fvs_of_rhss
             -> [(Id, CoreExprWithFVs)]
             -> [(Id, CoreExpr)]
 
@@ -575,7 +619,7 @@ fiBind platform to_drop (AnnRec bindings) body_fvs
         | ((binder, rhs), to_drop) <- zipEqual "fi_bind" pairs to_drops ]
 
 ------------------
-fiRhs :: Platform -> FloatInBinds -> CoreBndr -> CoreExprWithFVs -> CoreExpr
+fiRhs :: Platform -> RevFloatInBinds -> CoreBndr -> CoreExprWithFVs -> CoreExpr
 fiRhs platform to_drop bndr rhs
   | Just join_arity <- isJoinId_maybe bndr
   , let (bndrs, body) = collectNAnnBndrs join_arity rhs
@@ -675,68 +719,84 @@ point.
 We have to maintain the order on these drop-point-related lists.
 -}
 
--- pprFIB :: FloatInBinds -> SDoc
+-- pprFIB :: RevFloatInBinds -> SDoc
 -- pprFIB fibs = text "FIB:" <+> ppr [b | FB _ _ b <- fibs]
 
 sepBindsByDropPoint
     :: Platform
-    -> Bool                -- True <=> is case expression
-    -> [FreeVarSet]        -- One set of FVs per drop point
-                           -- Always at least two long!
-    -> FloatInBinds        -- Candidate floaters
-    -> [FloatInBinds]      -- FIRST one is bindings which must not be floated
-                           -- inside any drop point; the rest correspond
-                           -- one-to-one with the input list of FV sets
+    -> Bool                  -- True <=> is case expression
+    -> RevFloatInBinds       -- Candidate floaters
+    -> FreeVarSet            -- here_fvs: if these vars are free in a binding,
+                             --   don't float that binding inside any drop point
+    -> [FreeVarSet]          -- fork_fvs: one set of FVs per drop point
+    -> ( RevFloatInBinds     -- Bindings which must not be floated inside
+       , [RevFloatInBinds] ) -- Corresponds 1-1 with the input list of FV sets
 
 -- Every input floater is returned somewhere in the result;
 -- none are dropped, not even ones which don't seem to be
 -- free in *any* of the drop-point fvs.  Why?  Because, for example,
 -- a binding (let x = E in B) might have a specialised version of
 -- x (say x') stored inside x, but x' isn't free in E or B.
+--
+-- The here_fvs argument is used for two things:
+-- * Avoid shadowing bugs: see Note [Shadowing and name capture]
+-- * Drop some of the bindings at the top, e.g. of an application
 
 type DropBox = (FreeVarSet, FloatInBinds)
 
-sepBindsByDropPoint platform is_case drop_pts floaters
+dropBoxFloats :: DropBox -> RevFloatInBinds
+dropBoxFloats (_, floats) = reverse floats
+
+usedInDropBox :: DIdSet -> DropBox -> Bool
+usedInDropBox bndrs (db_fvs, _) = db_fvs `intersectsDVarSet` bndrs
+
+initDropBox :: DVarSet -> DropBox
+initDropBox fvs = (fvs, [])
+
+sepBindsByDropPoint platform is_case floaters here_fvs fork_fvs
   | null floaters  -- Shortcut common case
-  = [] : [[] | _ <- drop_pts]
+  = ([], [[] | _ <- fork_fvs])
 
   | otherwise
-  = ASSERT( drop_pts `lengthAtLeast` 2 )
-    go floaters (map (\fvs -> (fvs, [])) (emptyDVarSet : drop_pts))
+  = go floaters (initDropBox here_fvs) (map initDropBox fork_fvs)
   where
-    n_alts = length drop_pts
+    n_alts = length fork_fvs
 
-    go :: FloatInBinds -> [DropBox] -> [FloatInBinds]
-        -- The *first* one in the argument list is the drop_here set
-        -- The FloatInBinds in the lists are in the reverse of
-        -- the normal FloatInBinds order; that is, they are the right way round!
+    go :: RevFloatInBinds -> DropBox -> [DropBox]
+       -> (RevFloatInBinds, [RevFloatInBinds])
+        -- The *first* one in the pair is the drop_here set
 
-    go [] drop_boxes = map (reverse . snd) drop_boxes
+    go [] here_box fork_boxes
+        = (dropBoxFloats here_box, map dropBoxFloats fork_boxes)
 
-    go (bind_w_fvs@(FB bndrs bind_fvs bind) : binds) drop_boxes@(here_box : fork_boxes)
-        = go binds new_boxes
+    go (bind_w_fvs@(FB bndrs bind_fvs bind) : binds) here_box fork_boxes
+        | drop_here = go binds (insert here_box) fork_boxes
+        | otherwise = go binds here_box          new_fork_boxes
         where
           -- "here" means the group of bindings dropped at the top of the fork
 
-          (used_here : used_in_flags) = [ fvs `intersectsDVarSet` bndrs
-                                        | (fvs, _) <- drop_boxes]
+          used_here     = bndrs `usedInDropBox` here_box
+          used_in_flags = case fork_boxes of
+                            []  -> []
+                            [_] -> [True]  -- Push all bindings into a single branch
+                                           -- No need to look at its free vars
+                            _   -> map (bndrs `usedInDropBox`) fork_boxes
+               -- Short-cut for the singleton case;
+               -- used for lambdas and singleton cases
 
           drop_here = used_here || cant_push
 
           n_used_alts = count id used_in_flags -- returns number of Trues in list.
 
           cant_push
-            | is_case   = n_used_alts == n_alts   -- Used in all, don't push
-                                                  -- Remember n_alts > 1
+            | is_case   = (n_alts > 1 && n_used_alts == n_alts)
+                             -- Used in all, muliple branches, don't push
                           || (n_used_alts > 1 && not (floatIsDupable platform bind))
                              -- floatIsDupable: see Note [Duplicating floats]
 
             | otherwise = floatIsCase bind || n_used_alts > 1
                              -- floatIsCase: see Note [Floating primops]
 
-          new_boxes | drop_here = (insert here_box : fork_boxes)
-                    | otherwise = (here_box : new_fork_boxes)
-
           new_fork_boxes = zipWithEqual "FloatIn.sepBinds" insert_maybe
                                         fork_boxes used_in_flags
 
@@ -746,8 +806,6 @@ sepBindsByDropPoint platform is_case drop_pts floaters
           insert_maybe box True  = insert box
           insert_maybe box False = box
 
-    go _ _ = panic "sepBindsByDropPoint/go"
-
 
 {- Note [Duplicating floats]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -765,14 +823,14 @@ If the thing is used in all RHSs there is nothing gained,
 so we don't duplicate then.
 -}
 
-floatedBindsFVs :: FloatInBinds -> FreeVarSet
+floatedBindsFVs :: RevFloatInBinds -> FreeVarSet
 floatedBindsFVs binds = mapUnionDVarSet fbFVs binds
 
 fbFVs :: FloatInBind -> DVarSet
 fbFVs (FB _ fvs _) = fvs
 
-wrapFloats :: FloatInBinds -> CoreExpr -> CoreExpr
--- Remember FloatInBinds is in *reverse* dependency order
+wrapFloats :: RevFloatInBinds -> CoreExpr -> CoreExpr
+-- Remember RevFloatInBinds is in *reverse* dependency order
 wrapFloats []               e = e
 wrapFloats (FB _ _ fl : bs) e = wrapFloats bs (wrapFloat fl e)
 


=====================================
compiler/GHC/StgToCmm/ExtCode.hs
=====================================
@@ -236,8 +236,12 @@ emitLabel = code . F.emitLabel
 emitAssign :: CmmReg  -> CmmExpr -> CmmParse ()
 emitAssign l r = code (F.emitAssign l r)
 
-emitStore :: CmmExpr  -> CmmExpr -> CmmParse ()
-emitStore l r = code (F.emitStore l r)
+emitStore :: Maybe MemoryOrdering -> CmmExpr  -> CmmExpr -> CmmParse ()
+emitStore (Just mem_ord) l r = do
+  platform <- getPlatform
+  let w = typeWidth $ cmmExprType platform r
+  emit $ mkUnsafeCall (PrimTarget $ MO_AtomicWrite w mem_ord) [] [l,r]
+emitStore Nothing l r = code (F.emitStore l r)
 
 getCode :: CmmParse a -> CmmParse CmmAGraph
 getCode (EC ec) = EC $ \c e s -> do


=====================================
compiler/GHC/StgToCmm/Prim.hs
=====================================
@@ -293,9 +293,10 @@ emitPrimOp dflags primop = case primop of
     emitAssign (CmmLocal res) currentTSOExpr
 
   ReadMutVarOp -> \[mutv] -> opIntoRegs $ \[res] ->
-    emitAssign (CmmLocal res) (cmmLoadIndexW platform mutv (fixedHdrSizeW profile) (gcWord platform))
+    emitPrimCall [res] (MO_AtomicRead (wordWidth platform) MemOrderAcquire)
+        [ cmmOffsetW platform mutv (fixedHdrSizeW profile) ]
 
-  WriteMutVarOp -> \[mutv, var] -> opIntoRegs $ \res@[] -> do
+  WriteMutVarOp -> \[mutv, var] -> opIntoRegs $ \[] -> do
     old_val <- CmmLocal <$> newTemp (cmmExprType platform var)
     emitAssign old_val (cmmLoadIndexW platform mutv (fixedHdrSizeW profile) (gcWord platform))
 
@@ -304,8 +305,9 @@ emitPrimOp dflags primop = case primop of
     -- Note that this also must come after we read the old value to ensure
     -- that the read of old_val comes before another core's write to the
     -- MutVar's value.
-    emitPrimCall res MO_WriteBarrier []
-    emitStore (cmmOffsetW platform mutv (fixedHdrSizeW profile)) var
+    emitPrimCall [] (MO_AtomicWrite (wordWidth platform) MemOrderRelease)
+        [ cmmOffsetW platform mutv (fixedHdrSizeW profile), var ]
+
     emitCCall
             [{-no results-}]
             (CmmLit (CmmLabel mkDirty_MUT_VAR_Label))
@@ -3134,7 +3136,7 @@ doAtomicReadAddr
 doAtomicReadAddr res addr ty =
     emitPrimCall
         [ res ]
-        (MO_AtomicRead (typeWidth ty))
+        (MO_AtomicRead (typeWidth ty) MemOrderSeqCst)
         [ addr ]
 
 -- | Emit an atomic write to a byte array that acts as a memory barrier.
@@ -3162,7 +3164,7 @@ doAtomicWriteAddr
 doAtomicWriteAddr addr ty val =
     emitPrimCall
         [ {- no results -} ]
-        (MO_AtomicWrite (typeWidth ty))
+        (MO_AtomicWrite (typeWidth ty) MemOrderSeqCst)
         [ addr, val ]
 
 doCasByteArray


=====================================
libraries/bytestring
=====================================
@@ -1 +1 @@
-Subproject commit 1543e054a314865d89a259065921d5acba03d966
+Subproject commit 9cab76dc861f651c3940e873ce921d9e09733cc8


=====================================
libraries/ghc-bignum/gmp/gmp-tarballs
=====================================
@@ -1 +1 @@
-Subproject commit 31f9909680ba8fe00d27fd8a6f5d198a0a96c1ac
+Subproject commit 4f26049af40afb380eaf033ab91404cd2e214919


=====================================
rts/HeapStackCheck.cmm
=====================================
@@ -518,7 +518,7 @@ stg_block_takemvar_finally
     W_ r1, r3;
     r1 = R1;
     r3 = R3;
-    unlockClosure(R3, stg_MVAR_DIRTY_info);
+    unlockClosure(r3, stg_MVAR_DIRTY_info);
     R1 = r1;
     R3 = r3;
     jump StgReturn [R1];
@@ -546,7 +546,7 @@ stg_block_readmvar_finally
     W_ r1, r3;
     r1 = R1;
     r3 = R3;
-    unlockClosure(R3, stg_MVAR_DIRTY_info);
+    unlockClosure(r3, stg_MVAR_DIRTY_info);
     R1 = r1;
     R3 = r3;
     jump StgReturn [R1];
@@ -574,7 +574,7 @@ stg_block_putmvar_finally
     W_ r1, r3;
     r1 = R1;
     r3 = R3;
-    unlockClosure(R3, stg_MVAR_DIRTY_info);
+    unlockClosure(r3, stg_MVAR_DIRTY_info);
     R1 = r1;
     R3 = r3;
     jump StgReturn [R1];


=====================================
rts/Sparks.c
=====================================
@@ -79,6 +79,34 @@ newSpark (StgRegTable *reg, StgClosure *p)
     return 1;
 }
 
+/* Note [Pruning the spark pool]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+pruneSparkQueue checks if closures have been evacuated to know weither or
+not a spark can be GCed. If it was evacuated it's live and we keep the spark
+alive. If it hasn't been evacuated at the end of GC we can assume it's dead and
+remove the spark from the pool.
+
+To make this sound we must ensure GC has finished evacuating live objects before
+we prune the spark pool. Otherwise we might GC a spark before it has been evaluated.
+
+* If we run sequential GC then the GC Lead simply prunes after
+everything has been evacuated.
+
+* If we run parallel gc without work stealing then GC workers are not synchronized
+at any point before the worker returns. So we leave it to the GC Lead to prune
+sparks once evacuation has been finished and all workers returned.
+
+* If work stealing is enabled all GC threads will be running
+scavenge_until_all_done until regular heap marking is done. After which
+all workers will converge on a synchronization point. This means
+we can perform spark pruning inside the GC workers at this point.
+The only wart is that if we prune sparks locally we might
+miss sparks reachable via weak pointers as these are marked in the main
+thread concurrently to the calls to pruneSparkQueue. To fix this problem would
+require a GC barrier but that seems to high a price to pay.
+*/
+
+
 /* --------------------------------------------------------------------------
  * Remove all sparks from the spark queues which should not spark any
  * more.  Called after GC. We assume exclusive access to the structure
@@ -181,7 +209,7 @@ pruneSparkQueue (bool nonmovingMarkFinished, Capability *cap)
           cap->spark_stats.fizzled++;
           traceEventSparkFizzle(cap);
       } else {
-          info = spark->header.info;
+          info = RELAXED_LOAD(&spark->header.info);
           load_load_barrier();
           if (IS_FORWARDING_PTR(info)) {
               tmp = (StgClosure*)UN_FORWARDING_PTR(info);


=====================================
rts/eventlog/EventLog.c
=====================================
@@ -1074,7 +1074,17 @@ void postCapsetVecEvent (EventTypeNum tag,
 
     for (int i = 0; i < argc; i++) {
         // 1 + strlen to account for the trailing \0, used as separator
-        size += 1 + strlen(argv[i]);
+        int increment = 1 + strlen(argv[i]);
+        if (size + increment > EVENT_PAYLOAD_SIZE_MAX) {
+            errorBelch("Event size exceeds EVENT_PAYLOAD_SIZE_MAX, record only %"
+                       FMT_Int " out of %" FMT_Int " args",
+                       (long long) i,
+                       (long long) argc);
+            argc = i;
+            break;
+        } else {
+            size += increment;
+        }
     }
 
     ACQUIRE_LOCK(&eventBufMutex);


=====================================
rts/sm/GC.c
=====================================
@@ -291,6 +291,7 @@ GarbageCollect (uint32_t collect_gen,
       any_work, scav_find_work, max_n_todo_overflow;
 #if defined(THREADED_RTS)
   gc_thread *saved_gct;
+  bool gc_sparks_all_caps;
 #endif
   uint32_t g, n;
   // The time we should report our heap census as occurring at, if necessary.
@@ -555,6 +556,9 @@ GarbageCollect (uint32_t collect_gen,
   StgTSO *resurrected_threads = END_TSO_QUEUE;
   // must be last...  invariant is that everything is fully
   // scavenged at this point.
+#if defined(THREADED_RTS)
+  gc_sparks_all_caps = !work_stealing || !is_par_gc();
+#endif
   work_stealing = false;
   while (traverseWeakPtrList(&dead_weak_ptr_list, &resurrected_threads))
   {
@@ -567,6 +571,7 @@ GarbageCollect (uint32_t collect_gen,
   gcStableNameTable();
 
 #if defined(THREADED_RTS)
+  // See Note [Pruning the spark pool]
   if (!is_par_gc()) {
       for (n = 0; n < n_capabilities; n++) {
           pruneSparkQueue(false, capabilities[n]);
@@ -1371,7 +1376,6 @@ void
 gcWorkerThread (Capability *cap)
 {
     gc_thread *saved_gct;
-
     // necessary if we stole a callee-saves register for gct:
     saved_gct = gct;
 
@@ -1402,13 +1406,10 @@ gcWorkerThread (Capability *cap)
     scavenge_until_all_done();
 
 #if defined(THREADED_RTS)
-    // Now that the whole heap is marked, we discard any sparks that
-    // were found to be unreachable.  The main GC thread is currently
-    // marking heap reachable via weak pointers, so it is
-    // non-deterministic whether a spark will be retained if it is
-    // only reachable via weak pointers.  To fix this problem would
-    // require another GC barrier, which is too high a price.
-    pruneSparkQueue(false, cap);
+    // See Note [Pruning the spark pool]
+    if(work_stealing && is_par_gc()) {
+        pruneSparkQueue(false, cap);
+    }
 #endif
 
     // Wait until we're told to continue


=====================================
testsuite/tests/simplCore/should_compile/T22662.hs
=====================================
@@ -0,0 +1,6 @@
+module T22662 where
+
+import Data.Set
+
+foo x = sequence_ [ f y | y <- x ]
+  where f _ = return (fromList [0])


=====================================
testsuite/tests/simplCore/should_compile/all.T
=====================================
@@ -366,3 +366,4 @@ test('T20200', normal, compile, [''])
 # which (before the fix) lost crucial dependencies
 test('T20820',  normal, compile, ['-O0'])
 test('T22491', normal, compile, ['-O2'])
+test('T22662', normal, compile, [''])



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/5e7d2493aa70d2424e0c183d132adb5d93d31ea8...842547673c4b0a2989a4174fd66cc657fe99fb07

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/5e7d2493aa70d2424e0c183d132adb5d93d31ea8...842547673c4b0a2989a4174fd66cc657fe99fb07
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230131/7eda0696/attachment-0001.html>