[Git][ghc/ghc][wip/T24251a] 8 commits: rts: fix clang compilation on aarch64

Simon Peyton Jones (@simonpj) gitlab at gitlab.haskell.org
Wed Mar 27 10:39:17 UTC 2024



Simon Peyton Jones pushed to branch wip/T24251a at Glasgow Haskell Compiler / GHC


Commits:
7db8c992 by Cheng Shao at 2024-03-25T13:45:46-04:00
rts: fix clang compilation on aarch64

This patch fixes function prototypes in ARMOutlineAtomicsSymbols.h
which causes "error: address argument to atomic operation must be a
pointer to _Atomic type" when compiling with clang on aarch64.

- - - - -
237194ce by Sylvain Henry at 2024-03-25T13:46:27-04:00
Lexer: fix imports for Alex 3.5.1 (#24583)

- - - - -
810660b7 by Cheng Shao at 2024-03-25T22:19:16-04:00
libffi-tarballs: bump libffi-tarballs submodule to libffi 3.4.6

This commit bumps the libffi-tarballs submodule to libffi 3.4.6, which
includes numerous upstream libffi fixes, especially
https://github.com/libffi/libffi/issues/760.

- - - - -
d2ba41e8 by Alan Zimmerman at 2024-03-25T22:19:51-04:00
EPA: do not duplicate comments in signature RHS

- - - - -
32a8103f by Rodrigo Mesquita at 2024-03-26T21:16:12-04:00
configure: Use LDFLAGS when trying linkers

A user may configure `LDFLAGS` but not `LD`. When choosing a linker, we
will prefer `ldd`, then `ld.gold`, then `ld.bfd` -- however, we have to
check for a working linker. If either of these fail, we try the next in
line.

However, we were not considering the `$LDFLAGS` when checking if these
linkers worked. So we would pick a linker that does not support the
current $LDFLAGS and fail further down the line when we used that linker
with those flags.

Fixes #24565, where `LDFLAGS=-Wl,-z,pack-relative-relocs` is not
supported by `ld.gold` but that was being picked still.

- - - - -
bf65a7c3 by Rodrigo Mesquita at 2024-03-26T21:16:48-04:00
bindist: Clean xattrs of bin and lib at configure time

For issue #21506, we started cleaning the extended attributes of
binaries and libraries from the bindist *after* they were installed to
workaround notarisation (#17418), as part of `make install`.

However, the `ghc-toolchain` binary that is now shipped with the bindist
must be run at `./configure` time. Since we only cleaned the xattributes
of the binaries and libs after they were installed, in some situations
users would be unable to run `ghc-toolchain` from the bindist, failing
at configure time (#24554).

In this commit we move the xattr cleaning logic to the configure script.

Fixes #24554

- - - - -
cfeb70d3 by Rodrigo Mesquita at 2024-03-26T21:17:24-04:00
Revert "NCG: Fix a bug in jump shortcutting."

This reverts commit 5bd8ed53dcefe10b72acb5729789e19ceb22df66.

Fixes #24586

- - - - -
c5424f66 by Simon Peyton Jones at 2024-03-27T10:38:51+00:00
Eliminate redundant cases in CSE

Addresses programs like this

  f xs = xs `seq`
         (let t = reverse $ reverse $ reverse $ reverse $ reverse $ reverse xs in
          case xs of
            [] ->  (t,True)
            (_:_) -> (t,False))

Also including the case where t is a join point.

Relates to #24251.  Test in T24251a.

(And see Simon's GHC Log 13 March.)

Also added a perf test for #21741

- - - - -


30 changed files:

- compiler/GHC/CmmToAsm/AArch64/Instr.hs
- compiler/GHC/CmmToAsm/BlockLayout.hs
- compiler/GHC/CmmToAsm/Instr.hs
- compiler/GHC/CmmToAsm/PPC/Instr.hs
- compiler/GHC/CmmToAsm/Reg/Graph/SpillClean.hs
- compiler/GHC/CmmToAsm/Reg/Linear/JoinToTargets.hs
- compiler/GHC/CmmToAsm/Reg/Liveness.hs
- compiler/GHC/CmmToAsm/X86/Instr.hs
- compiler/GHC/Core/Opt/CSE.hs
- compiler/GHC/Core/Opt/Simplify/Env.hs
- compiler/GHC/Core/Opt/Simplify/Iteration.hs
- compiler/GHC/Core/Opt/Simplify/Utils.hs
- compiler/GHC/Hs/Utils.hs
- compiler/GHC/Parser/Lexer.x
- compiler/GHC/Types/Id.hs
- distrib/configure.ac.in
- hadrian/bindist/Makefile
- libffi-tarballs
- m4/fp_cc_linker_flag_try.m4
- rts/ARMOutlineAtomicsSymbols.h
- − testsuite/tests/codeGen/should_run/T24507.hs
- − testsuite/tests/codeGen/should_run/T24507.stdout
- − testsuite/tests/codeGen/should_run/T24507_cmm.cmm
- testsuite/tests/codeGen/should_run/all.T
- + testsuite/tests/perf/should_run/T21741.hs
- + testsuite/tests/perf/should_run/T21741.stdout
- testsuite/tests/perf/should_run/all.T
- + testsuite/tests/simplCore/should_compile/T24251a.hs
- + testsuite/tests/simplCore/should_compile/T24251a.stderr
- testsuite/tests/simplCore/should_compile/all.T


Changes:

=====================================
compiler/GHC/CmmToAsm/AArch64/Instr.hs
=====================================
@@ -301,20 +301,15 @@ isJumpishInstr instr = case instr of
 -- | Checks whether this instruction is a jump/branch instruction.
 -- One that can change the flow of control in a way that the
 -- register allocator needs to worry about.
-jumpDestsOfInstr :: Instr -> [Maybe BlockId]
+jumpDestsOfInstr :: Instr -> [BlockId]
 jumpDestsOfInstr (ANN _ i) = jumpDestsOfInstr i
-jumpDestsOfInstr i = case i of
-    (CBZ _ t) -> [ mkDest t ]
-    (CBNZ _ t) -> [ mkDest t ]
-    (J t) -> [ mkDest t ]
-    (B t) -> [ mkDest t ]
-    (BL t _ _) -> [ mkDest t ]
-    (BCOND _ t) -> [ mkDest t ]
-    _ -> []
-  where
-    mkDest (TBlock id) = Just id
-    mkDest TLabel{} = Nothing
-    mkDest TReg{} = Nothing
+jumpDestsOfInstr (CBZ _ t) = [ id | TBlock id <- [t]]
+jumpDestsOfInstr (CBNZ _ t) = [ id | TBlock id <- [t]]
+jumpDestsOfInstr (J t) = [id | TBlock id <- [t]]
+jumpDestsOfInstr (B t) = [id | TBlock id <- [t]]
+jumpDestsOfInstr (BL t _ _) = [ id | TBlock id <- [t]]
+jumpDestsOfInstr (BCOND _ t) = [ id | TBlock id <- [t]]
+jumpDestsOfInstr _ = []
 
 -- | Change the destination of this jump instruction.
 -- Used in the linear allocator when adding fixup blocks for join


=====================================
compiler/GHC/CmmToAsm/BlockLayout.hs
=====================================
@@ -771,7 +771,7 @@ dropJumps :: forall a i. Instruction i => LabelMap a -> [GenBasicBlock i]
 dropJumps _    [] = []
 dropJumps info (BasicBlock lbl ins:todo)
     | Just ins <- nonEmpty ins --This can happen because of shortcutting
-    , [Just dest] <- jumpDestsOfInstr (NE.last ins)
+    , [dest] <- jumpDestsOfInstr (NE.last ins)
     , BasicBlock nextLbl _ : _ <- todo
     , not (mapMember dest info)
     , nextLbl == dest
@@ -870,7 +870,7 @@ mkNode edgeWeights block@(BasicBlock id instrs) =
               | length successors > 2 || edgeWeight info <= 0 -> []
               | otherwise -> [target]
           | Just instr <- lastMaybe instrs
-          , [one] <- jumpBlockDestsOfInstr instr
+          , [one] <- jumpDestsOfInstr instr
           = [one]
           | otherwise = []
 


=====================================
compiler/GHC/CmmToAsm/Instr.hs
=====================================
@@ -17,8 +17,6 @@ import GHC.Cmm.BlockId
 import GHC.CmmToAsm.Config
 import GHC.Data.FastString
 
-import Data.Maybe (catMaybes)
-
 -- | Holds a list of source and destination registers used by a
 --      particular instruction.
 --
@@ -75,17 +73,9 @@ class Instruction instr where
 
         -- | Give the possible destinations of this jump instruction.
         --      Must be defined for all jumpish instructions.
-        --      Returns Nothing for non BlockId destinations.
         jumpDestsOfInstr
-                :: instr -> [Maybe BlockId]
-
-        -- | Give the possible block destinations of this jump instruction.
-        --      Must be defined for all jumpish instructions.
-        jumpBlockDestsOfInstr
                 :: instr -> [BlockId]
 
-        jumpBlockDestsOfInstr = catMaybes . jumpDestsOfInstr
-
 
         -- | Change the destination of this jump instruction.
         --      Used in the linear allocator when adding fixup blocks for join


=====================================
compiler/GHC/CmmToAsm/PPC/Instr.hs
=====================================
@@ -513,15 +513,12 @@ isJumpishInstr instr
 -- | Checks whether this instruction is a jump/branch instruction.
 -- One that can change the flow of control in a way that the
 -- register allocator needs to worry about.
-jumpDestsOfInstr :: Instr -> [Maybe BlockId]
+jumpDestsOfInstr :: Instr -> [BlockId]
 jumpDestsOfInstr insn
   = case insn of
-        BCC _ id _       -> [Just id]
-        BCCFAR _ id _    -> [Just id]
-        BCTR targets _ _ -> targets
-        BCTRL{}          -> [Nothing]
-        BL{}             -> [Nothing]
-        JMP{}            -> [Nothing]
+        BCC _ id _       -> [id]
+        BCCFAR _ id _    -> [id]
+        BCTR targets _ _ -> [id | Just id <- targets]
         _                -> []
 
 


=====================================
compiler/GHC/CmmToAsm/Reg/Graph/SpillClean.hs
=====================================
@@ -207,7 +207,7 @@ cleanForward platform blockId assoc acc (li : instrs)
 
         -- Remember the association over a jump.
         | LiveInstr instr _     <- li
-        , targets               <- jumpBlockDestsOfInstr instr
+        , targets               <- jumpDestsOfInstr instr
         , not $ null targets
         = do    mapM_ (accJumpValid assoc) targets
                 cleanForward platform blockId assoc (li : acc) instrs
@@ -386,7 +386,7 @@ cleanBackward' liveSlotsOnEntry reloadedBy noReloads acc (li : instrs)
         --       it always does, but if those reloads are cleaned the slot
         --       liveness map doesn't get updated.
         | LiveInstr instr _     <- li
-        , targets               <- jumpBlockDestsOfInstr instr
+        , targets               <- jumpDestsOfInstr instr
         = do
                 let slotsReloadedByTargets
                         = IntSet.unions


=====================================
compiler/GHC/CmmToAsm/Reg/Linear/JoinToTargets.hs
=====================================
@@ -57,7 +57,7 @@ joinToTargets block_live id instr
         = return ([], instr)
 
         | otherwise
-        = joinToTargets' block_live [] id instr (jumpBlockDestsOfInstr instr)
+        = joinToTargets' block_live [] id instr (jumpDestsOfInstr instr)
 
 -----
 joinToTargets'


=====================================
compiler/GHC/CmmToAsm/Reg/Liveness.hs
=====================================
@@ -468,7 +468,7 @@ slurpReloadCoalesce live
 
                 -- if we hit a jump, remember the current slotMap
                 | LiveInstr (Instr instr) _     <- li
-                , targets                       <- jumpBlockDestsOfInstr instr
+                , targets                       <- jumpDestsOfInstr instr
                 , not $ null targets
                 = do    mapM_   (accSlotMap slotMap) targets
                         return  (slotMap, Nothing)
@@ -760,7 +760,7 @@ sccBlocks blocks entries mcfg = map (fmap node_payload) sccs
         sccs = stronglyConnCompG g2
 
         getOutEdges :: Instruction instr => [instr] -> [BlockId]
-        getOutEdges instrs = concatMap jumpBlockDestsOfInstr instrs
+        getOutEdges instrs = concatMap jumpDestsOfInstr instrs
 
         -- This is truly ugly, but I don't see a good alternative.
         -- Digraph just has the wrong API.  We want to identify nodes
@@ -837,7 +837,7 @@ checkIsReverseDependent sccs'
 
         slurpJumpDestsOfBlock (BasicBlock _ instrs)
                 = unionManyUniqSets
-                $ map (mkUniqSet . jumpBlockDestsOfInstr)
+                $ map (mkUniqSet . jumpDestsOfInstr)
                         [ i | LiveInstr i _ <- instrs]
 
 
@@ -1047,7 +1047,7 @@ liveness1 platform liveregs blockmap (LiveInstr instr _)
 
             -- union in the live regs from all the jump destinations of this
             -- instruction.
-            targets      = jumpBlockDestsOfInstr instr -- where we go from here
+            targets      = jumpDestsOfInstr instr -- where we go from here
             not_a_branch = null targets
 
             targetLiveRegs target


=====================================
compiler/GHC/CmmToAsm/X86/Instr.hs
=====================================
@@ -672,16 +672,13 @@ isJumpishInstr instr
 
 jumpDestsOfInstr
         :: Instr
-        -> [Maybe BlockId]
+        -> [BlockId]
 
 jumpDestsOfInstr insn
   = case insn of
-        JXX _ id        -> [Just id]
-        JMP_TBL _ ids _ _ -> [(mkDest dest) | Just dest <- ids]
+        JXX _ id        -> [id]
+        JMP_TBL _ ids _ _ -> [id | Just (DestBlockId id) <- ids]
         _               -> []
-    where
-      mkDest (DestBlockId id) = Just id
-      mkDest _ = Nothing
 
 
 patchJumpInstr


=====================================
compiler/GHC/Core/Opt/CSE.hs
=====================================
@@ -14,13 +14,15 @@ import GHC.Types.Var.Env ( mkInScopeSet )
 import GHC.Types.Id     ( Id, idType, idHasRules, zapStableUnfolding
                         , idInlineActivation, setInlineActivation
                         , zapIdOccInfo, zapIdUsageInfo, idInlinePragma
-                        , isJoinId, idJoinPointHood, idUnfolding )
-import GHC.Core.Utils   ( mkAltExpr
-                        , exprIsTickedString
+                        , isJoinId, idJoinPointHood, idUnfolding
+                        , zapIdUnfolding, isDeadBinder )
+import GHC.Core.Utils   ( mkAltExpr, exprIsTickedString
                         , stripTicksE, stripTicksT, mkTicks )
 import GHC.Core.FVs     ( exprFreeVars )
 import GHC.Core.Type    ( tyConAppArgs )
 import GHC.Core
+import GHC.Core.Utils   ( exprIsTrivial )
+import GHC.Core.Opt.OccurAnal( scrutBinderSwap_maybe )
 import GHC.Utils.Outputable
 import GHC.Types.Basic
 import GHC.Types.Tickish
@@ -714,25 +716,33 @@ cseExpr env (Case e bndr ty alts) = cseCase env e bndr ty alts
 
 cseCase :: CSEnv -> InExpr -> InId -> InType -> [InAlt] -> OutExpr
 cseCase env scrut bndr ty alts
-  = Case scrut1 bndr3 ty' $
-    combineAlts (map cse_alt alts)
+  | Just body <- caseElim scrut bndr alts
+  , -- See Note [Eliminating redundant cases]
+    let zapped_bndr = zapIdUnfolding bndr -- Wrinkle (ERC1)
+  = cseExpr env (Let (NonRec zapped_bndr scrut) body)
+
+  | otherwise
+  = Case scrut' bndr' ty' alts'
+
   where
     ty' = substTyUnchecked (csEnvSubst env) ty
-    (cse_done, scrut1) = try_for_cse env scrut
+    (cse_done, scrut') = try_for_cse env scrut
 
     bndr1 = zapIdOccInfo bndr
       -- Zapping the OccInfo is needed because the extendCSEnv
       -- in cse_alt may mean that a dead case binder
       -- becomes alive, and Lint rejects that
     (env1, bndr2)    = addBinder env bndr1
-    (alt_env, bndr3) = extendCSEnvWithBinding env1 bndr bndr2 scrut1 cse_done
+    (alt_env, bndr') = extendCSEnvWithBinding env1 bndr bndr2 scrut' cse_done
          -- extendCSEnvWithBinding: see Note [CSE for case expressions]
 
+    alts' = combineAlts (map cse_alt alts)
+
     con_target :: OutExpr
     con_target = lookupSubst alt_env bndr
 
     arg_tys :: [OutType]
-    arg_tys = tyConAppArgs (idType bndr3)
+    arg_tys = tyConAppArgs (idType bndr')
 
     -- See Note [CSE for case alternatives]
     cse_alt (Alt (DataAlt con) args rhs)
@@ -747,6 +757,45 @@ cseCase env scrut bndr ty alts
         where
           (env', args') = addBinders alt_env args
 
+caseElim :: InExpr -> InId -> [InAlt] -> Maybe InExpr
+-- Can we eliminate the case altogether?  If so return the body.
+--   Note [Eliminating redundant cases]
+caseElim scrut case_bndr alts
+  | [Alt _ bndrs rhs] <- alts
+  , Just (scrut_var, _) <- scrutBinderSwap_maybe scrut
+  , all isDeadBinder bndrs
+  , isEvaldSoon (scrut_var, case_bndr) rhs
+  = Just rhs
+
+  | otherwise
+  = Nothing
+
+isEvaldSoon :: (OutId, OutId) -> OutExpr -> Bool
+-- (isEvaldSoon (v1,v2) e) is True if either v1 or v2 is evaluated "soon" by e
+isEvaldSoon (v1,v2) expr
+  = go expr
+  where
+    hit :: Var -> Bool
+    hit v = v==v1 || v==v2
+
+    go (Var v)    = hit v
+    go (Let _ e)  = go e
+    go (Tick _ e) = go e
+    go (Cast e _) = go e
+
+    go (Case scrut cb _ alts)
+      = go scrut ||
+        (exprIsTrivial scrut &&
+         all go_alt alts &&
+         not (hit cb)    &&  -- Check for
+         all ok_alt alts)    --   shadowing
+         -- ok_alt only runs if things look good
+
+    go _ = False  -- Lit, App, Lam, Coercion, Type
+
+    go_alt (Alt _ _ rhs) = go rhs
+    ok_alt (Alt _ cbs _) = not (any hit cbs)
+
 combineAlts :: [OutAlt] -> [OutAlt]
 -- See Note [Combine case alternatives]
 combineAlts alts
@@ -798,6 +847,64 @@ turning K2 into 'x' increases the number of live variables.  But
 * The next run of the simplifier will turn 'x' back into K2, so we won't
   permanently bloat the free-var count.
 
+Note [Eliminating redundant cases]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider
+   case x of x' { DEFAULT ->
+   case y of y' { DEFAULT ->
+   let v = <thunk> in
+   case x' of { True -> e1; False -> e2 }
+   body }}}
+
+The initial `seq` of `x` is redundant.  But the Simplifier is generally
+careful not to drop that outer `case x` based on its demand-info: see Note
+[Case-to-let for strictly-used binders] in GHC.Core.Opt.Simplify.Iteration.
+
+Instead we peek at the body of the case, using `isEvaldSoon`, to see if `x` is
+evaluated "soon" in the code path that follows.  If so we transform the
+`case` to a `let`
+   let x' = x in
+   case y of y' ... etc...
+
+The notion of "soon" is a bit squishy, and is implemented by `isEvaldSoon`.
+We allow interchanging eval's (as in the `case x` vs `case y` above.  But
+what about
+   case x of x' { DEFAULT ->
+   case (f y) of y' { DEFAULT ->
+   case x' of { True -> e1; False -> e2 }
+If we drop the `seq` on `x` we fall vicitm of #21741.  There is nothing
+wrong /semantically/ with dropping the `seq`, but the case of #21741 it causes
+a big space leak.
+
+So the conditions in `isEvaldSoon` are quite narrow: the two evals are
+separated only by lets and other evals on /variables/.
+
+Wrinkle (ERC1):
+   x' will have an (OtherCon []) unfolding on it.  We want to zap that
+   unfolding before turning it into (let x' = x in ...).
+
+Wrinkle (ERC2):
+  You might wonder if case-merging in the Simplifer doesn't cover this.
+  See GHC.Core.Opt.Simplify.Utils.tryCaseMerge. and Note [Merge Nested Cases]
+  in that same module.  But no, it is defeated by the 'let v = <thunk>` in our
+  example above, and I didn't want to make it more complicated.
+
+  Mabye case-merging should be made simpler, or even moved outright here into CSE.
+
+Wrinkle (ERC3):
+  Why is this done in CSE?  Becuase the "peeking" is tiresome and potentially a bit
+  expensive (quadratic in deep nests) so we don't want too often.  CSE runs seldom,
+  it is a pretty simple pass, and it's easy to "drop in" this extra optimisation.
+  Also eliminating redundant cases is a bit like commoning up duplicated work.
+
+Wrinkle (ERC4):
+  You might wonder whether we want to do this "optimisation" /at all/. After all, as
+  Note [Case-to-let for strictly-used binders] point out, dropping the eval is
+  not a huge deal, because the inner eval should just be a multi-way jump (no
+  actual eval).  But droppping the eval removes clutter, and I found that not dropping
+  made some functions look a bit bigger, and hence they didn't get inlined.
+
+  This is small beer though: I don't think it's an /important/ transformation.
 
 Note [Combine case alternatives]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


=====================================
compiler/GHC/Core/Opt/Simplify/Env.hs
=====================================
@@ -17,7 +17,7 @@ module GHC.Core.Opt.Simplify.Env (
         seDoEtaReduction, seEtaExpand, seFloatEnable, seInline, seNames,
         seOptCoercionOpts, sePedanticBottoms, sePhase, sePlatform, sePreInline,
         seRuleOpts, seRules, seUnfoldingOpts,
-        mkSimplEnv, extendIdSubst,
+        mkSimplEnv, extendIdSubst, extendCvIdSubst,
         extendTvSubst, extendCvSubst,
         zapSubstEnv, setSubstEnv, bumpCaseDepth,
         getInScope, setInScopeFromE, setInScopeFromF,
@@ -550,6 +550,10 @@ extendCvSubst env@(SimplEnv {seCvSubst = csubst}) var co
   = assert (isCoVar var) $
     env {seCvSubst = extendVarEnv csubst var co}
 
+extendCvIdSubst :: SimplEnv -> Id -> OutExpr -> SimplEnv
+extendCvIdSubst env bndr (Coercion co) = extendCvSubst env bndr co
+extendCvIdSubst env bndr rhs           = extendIdSubst env bndr (DoneEx rhs NotJoinPoint)
+
 ---------------------
 getInScope :: SimplEnv -> InScopeSet
 getInScope env = seInScope env


=====================================
compiler/GHC/Core/Opt/Simplify/Iteration.hs
=====================================
@@ -412,9 +412,7 @@ simplAuxBind env bndr new_rhs
   -- have no NOLINE pragmas, nor RULEs
   | exprIsTrivial new_rhs  -- Short-cut for let x = y in ...
   = return ( emptyFloats env
-           , case new_rhs of
-                Coercion co -> extendCvSubst env bndr co
-                _           -> extendIdSubst env bndr (DoneEx new_rhs NotJoinPoint) )
+           , extendCvIdSubst env bndr new_rhs )  -- bndr can be a CoVar
 
   | otherwise
   = do  { -- ANF-ise the RHS


=====================================
compiler/GHC/Core/Opt/Simplify/Utils.hs
=====================================
@@ -2394,6 +2394,9 @@ the outer case scrutinises the same variable as the outer case. This
 transformation is called Case Merging.  It avoids that the same
 variable is scrutinised multiple times.
 
+See also Note [Eliminating redundant cases] in GHC.Core.Opt.CSE, especially
+wrinkle (ERC2).
+
 Wrinkles
 
 (MC1) `tryCaseMerge` "looks though" an inner single-alternative case-on-variable.


=====================================
compiler/GHC/Hs/Utils.hs
=====================================
@@ -736,12 +736,12 @@ mkBigLHsPatTup = mkChunkified mkLHsPatTup
 
 -- | Convert an 'LHsType' to an 'LHsSigType'.
 hsTypeToHsSigType :: LHsType GhcPs -> LHsSigType GhcPs
-hsTypeToHsSigType lty@(L loc ty) = L loc $ case ty of
+hsTypeToHsSigType lty@(L loc ty) = case ty of
   HsForAllTy { hst_tele = HsForAllInvis { hsf_xinvis = an
                                         , hsf_invis_bndrs = bndrs }
              , hst_body = body }
-    -> mkHsExplicitSigType an bndrs body
-  _ -> mkHsImplicitSigType lty
+    -> L loc $ mkHsExplicitSigType an bndrs body
+  _ -> L (l2l loc) $ mkHsImplicitSigType lty -- The annotations are in lty, erase them from loc
 
 -- | Convert an 'LHsType' to an 'LHsSigWcType'.
 hsTypeToHsSigWcType :: LHsType GhcPs -> LHsSigWcType GhcPs


=====================================
compiler/GHC/Parser/Lexer.x
=====================================
@@ -2870,7 +2870,7 @@ alexInputPrevChar (AI _ buf) = unsafeChr (fromIntegral (adjustChar pc))
   where pc = prevChar buf '\n'
 
 unsafeChr :: Int -> Char
-unsafeChr (I# c) = C# (chr# c)
+unsafeChr (I# c) = GHC.Exts.C# (GHC.Exts.chr# c)
 
 -- backwards compatibility for Alex 2.x
 alexGetChar :: AlexInput -> Maybe (Char,AlexInput)


=====================================
compiler/GHC/Types/Id.hs
=====================================
@@ -625,7 +625,7 @@ isImplicitId id
 idIsFrom :: Module -> Id -> Bool
 idIsFrom mod id = nameIsLocalOrFrom mod (idName id)
 
-isDeadBinder :: Id -> Bool
+isDeadBinder :: Var -> Bool
 isDeadBinder bndr | isId bndr = isDeadOcc (idOccInfo bndr)
                   | otherwise = False   -- TyVars count as not dead
 


=====================================
distrib/configure.ac.in
=====================================
@@ -105,6 +105,29 @@ if test "$HostOS" = "mingw32" -a "$EnableDistroToolchain" = "NO"; then
   FP_SETUP_WINDOWS_TOOLCHAIN([$hardtop/mingw/], [\$\$topdir/../mingw/])
 fi
 
+
+if test "$HostOS" = "darwin"; then
+    # On darwin, we need to clean the extended attributes of the
+    # ghc-toolchain binary and its dynamic library before we can execute it in the bindist
+    # (this is a workaround for #24554, for the lack of proper notarisation #17418)
+
+    # The following is the work around suggested by @carter in #17418 during
+    # install time. This should help us with code signing issues by removing
+    # extended attributes from all files.
+    XATTR=${XATTR:-/usr/bin/xattr}
+
+    if [ -e "${XATTR}" ]; then
+
+        # Instead of cleaning the attributes of the ghc-toolchain binary only,
+        # we clean them from all files in the bin/ and lib/ directories, as it additionally future
+        # proofs running executables from the bindist besides ghc-toolchain at configure time, and
+        # we can avoid figuring out the path to the ghc-toolchain dynlib specifically.
+        /usr/bin/xattr -rc bin/
+        /usr/bin/xattr -rc lib/
+
+    fi
+fi
+
 dnl ** Which gcc to use?
 dnl --------------------------------------------------------------
 AC_PROG_CC([gcc clang])


=====================================
hadrian/bindist/Makefile
=====================================
@@ -19,13 +19,6 @@ default:
 # TODO : find if a better function exists
 eq=$(and $(findstring $(1),$(2)),$(findstring $(2),$(1)))
 
-# the following is the work around suggested by @carter in #17418 during install
-# time.  This should help us with code signing issues by removing extended
-# attributes from all files.
-ifeq "$(Darwin_Host)" "YES"
-XATTR ?= /usr/bin/xattr
-endif
-
 # patchpackageconf
 #
 # Hacky function to patch up the 'haddock-interfaces' and 'haddock-html'
@@ -157,10 +150,6 @@ install_bin_libdir:
 			$(INSTALL_PROGRAM) "$$i" "$(DESTDIR)$(ActualBinsDir)"; \
 		fi; \
 	done
-	# Work around #17418 on Darwin
-	if [ -e "${XATTR}" ]; then \
-		"${XATTR}" -c -r "$(DESTDIR)$(ActualBinsDir)"; \
-	fi
 
 .PHONY: install_bin_direct
 install_bin_direct:
@@ -195,10 +184,6 @@ install_lib: lib/settings
 		    $(INSTALL_DATA) $$i "$$dest/`dirname $$i`" ;; \
 		esac; \
 	done; \
-	# Work around #17418 on Darwin
-	if [ -e "${XATTR}" ]; then \
-		"${XATTR}" -c -r "$(DESTDIR)$(ActualLibsDir)"; \
-	fi
 
 .PHONY: install_docs
 install_docs:


=====================================
libffi-tarballs
=====================================
@@ -1 +1 @@
-Subproject commit 5624fd5c8bbce8432cd3c0b0ea92d152a1bba047
+Subproject commit 89a9b01c5647c8f0d3899435b99df690f582e9f1


=====================================
m4/fp_cc_linker_flag_try.m4
=====================================
@@ -9,7 +9,7 @@
 AC_DEFUN([FP_CC_LINKER_FLAG_TRY], [
     AC_MSG_CHECKING([whether C compiler supports -fuse-ld=$1])
     echo 'int main(void) {return 0;}' > conftest.c
-    if $CC -o conftest.o -fuse-ld=$1 conftest.c > /dev/null 2>&1
+    if $CC -o conftest.o -fuse-ld=$1 $LDFLAGS conftest.c > /dev/null 2>&1
     then
         $2="-fuse-ld=$1"
         AC_MSG_RESULT([yes])


=====================================
rts/ARMOutlineAtomicsSymbols.h
=====================================
@@ -10,583 +10,583 @@
 #include <stdint.h>
 #include <stdatomic.h>
 
-uint8_t ghc___aarch64_cas1_relax(uint8_t old, uint8_t new, uint8_t* p);
-uint8_t ghc___aarch64_cas1_relax(uint8_t old, uint8_t new, uint8_t* p) {
+uint8_t ghc___aarch64_cas1_relax(uint8_t old, uint8_t new, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_cas1_relax(uint8_t old, uint8_t new, _Atomic uint8_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_relaxed, memory_order_relaxed); return old;
 }
 
-uint8_t ghc___aarch64_cas1_acq(uint8_t old, uint8_t new, uint8_t* p);
-uint8_t ghc___aarch64_cas1_acq(uint8_t old, uint8_t new, uint8_t* p) {
+uint8_t ghc___aarch64_cas1_acq(uint8_t old, uint8_t new, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_cas1_acq(uint8_t old, uint8_t new, _Atomic uint8_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_acquire, memory_order_acquire); return old;
 }
 
-uint8_t ghc___aarch64_cas1_acq_rel(uint8_t old, uint8_t new, uint8_t* p);
-uint8_t ghc___aarch64_cas1_acq_rel(uint8_t old, uint8_t new, uint8_t* p) {
+uint8_t ghc___aarch64_cas1_acq_rel(uint8_t old, uint8_t new, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_cas1_acq_rel(uint8_t old, uint8_t new, _Atomic uint8_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_acq_rel, memory_order_acquire); return old;
 }
 
-uint8_t ghc___aarch64_cas1_sync(uint8_t old, uint8_t new, uint8_t* p);
-uint8_t ghc___aarch64_cas1_sync(uint8_t old, uint8_t new, uint8_t* p) {
+uint8_t ghc___aarch64_cas1_sync(uint8_t old, uint8_t new, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_cas1_sync(uint8_t old, uint8_t new, _Atomic uint8_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_seq_cst, memory_order_seq_cst); return old;
 }
 
-uint16_t ghc___aarch64_cas2_relax(uint16_t old, uint16_t new, uint16_t* p);
-uint16_t ghc___aarch64_cas2_relax(uint16_t old, uint16_t new, uint16_t* p) {
+uint16_t ghc___aarch64_cas2_relax(uint16_t old, uint16_t new, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_cas2_relax(uint16_t old, uint16_t new, _Atomic uint16_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_relaxed, memory_order_relaxed); return old;
 }
 
-uint16_t ghc___aarch64_cas2_acq(uint16_t old, uint16_t new, uint16_t* p);
-uint16_t ghc___aarch64_cas2_acq(uint16_t old, uint16_t new, uint16_t* p) {
+uint16_t ghc___aarch64_cas2_acq(uint16_t old, uint16_t new, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_cas2_acq(uint16_t old, uint16_t new, _Atomic uint16_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_acquire, memory_order_acquire); return old;
 }
 
-uint16_t ghc___aarch64_cas2_acq_rel(uint16_t old, uint16_t new, uint16_t* p);
-uint16_t ghc___aarch64_cas2_acq_rel(uint16_t old, uint16_t new, uint16_t* p) {
+uint16_t ghc___aarch64_cas2_acq_rel(uint16_t old, uint16_t new, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_cas2_acq_rel(uint16_t old, uint16_t new, _Atomic uint16_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_acq_rel, memory_order_acquire); return old;
 }
 
-uint16_t ghc___aarch64_cas2_sync(uint16_t old, uint16_t new, uint16_t* p);
-uint16_t ghc___aarch64_cas2_sync(uint16_t old, uint16_t new, uint16_t* p) {
+uint16_t ghc___aarch64_cas2_sync(uint16_t old, uint16_t new, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_cas2_sync(uint16_t old, uint16_t new, _Atomic uint16_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_seq_cst, memory_order_seq_cst); return old;
 }
 
-uint32_t ghc___aarch64_cas4_relax(uint32_t old, uint32_t new, uint32_t* p);
-uint32_t ghc___aarch64_cas4_relax(uint32_t old, uint32_t new, uint32_t* p) {
+uint32_t ghc___aarch64_cas4_relax(uint32_t old, uint32_t new, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_cas4_relax(uint32_t old, uint32_t new, _Atomic uint32_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_relaxed, memory_order_relaxed); return old;
 }
 
-uint32_t ghc___aarch64_cas4_acq(uint32_t old, uint32_t new, uint32_t* p);
-uint32_t ghc___aarch64_cas4_acq(uint32_t old, uint32_t new, uint32_t* p) {
+uint32_t ghc___aarch64_cas4_acq(uint32_t old, uint32_t new, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_cas4_acq(uint32_t old, uint32_t new, _Atomic uint32_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_acquire, memory_order_acquire); return old;
 }
 
-uint32_t ghc___aarch64_cas4_acq_rel(uint32_t old, uint32_t new, uint32_t* p);
-uint32_t ghc___aarch64_cas4_acq_rel(uint32_t old, uint32_t new, uint32_t* p) {
+uint32_t ghc___aarch64_cas4_acq_rel(uint32_t old, uint32_t new, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_cas4_acq_rel(uint32_t old, uint32_t new, _Atomic uint32_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_acq_rel, memory_order_acquire); return old;
 }
 
-uint32_t ghc___aarch64_cas4_sync(uint32_t old, uint32_t new, uint32_t* p);
-uint32_t ghc___aarch64_cas4_sync(uint32_t old, uint32_t new, uint32_t* p) {
+uint32_t ghc___aarch64_cas4_sync(uint32_t old, uint32_t new, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_cas4_sync(uint32_t old, uint32_t new, _Atomic uint32_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_seq_cst, memory_order_seq_cst); return old;
 }
 
-uint64_t ghc___aarch64_cas8_relax(uint64_t old, uint64_t new, uint64_t* p);
-uint64_t ghc___aarch64_cas8_relax(uint64_t old, uint64_t new, uint64_t* p) {
+uint64_t ghc___aarch64_cas8_relax(uint64_t old, uint64_t new, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_cas8_relax(uint64_t old, uint64_t new, _Atomic uint64_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_relaxed, memory_order_relaxed); return old;
 }
 
-uint64_t ghc___aarch64_cas8_acq(uint64_t old, uint64_t new, uint64_t* p);
-uint64_t ghc___aarch64_cas8_acq(uint64_t old, uint64_t new, uint64_t* p) {
+uint64_t ghc___aarch64_cas8_acq(uint64_t old, uint64_t new, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_cas8_acq(uint64_t old, uint64_t new, _Atomic uint64_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_acquire, memory_order_acquire); return old;
 }
 
-uint64_t ghc___aarch64_cas8_acq_rel(uint64_t old, uint64_t new, uint64_t* p);
-uint64_t ghc___aarch64_cas8_acq_rel(uint64_t old, uint64_t new, uint64_t* p) {
+uint64_t ghc___aarch64_cas8_acq_rel(uint64_t old, uint64_t new, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_cas8_acq_rel(uint64_t old, uint64_t new, _Atomic uint64_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_acq_rel, memory_order_acquire); return old;
 }
 
-uint64_t ghc___aarch64_cas8_sync(uint64_t old, uint64_t new, uint64_t* p);
-uint64_t ghc___aarch64_cas8_sync(uint64_t old, uint64_t new, uint64_t* p) {
+uint64_t ghc___aarch64_cas8_sync(uint64_t old, uint64_t new, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_cas8_sync(uint64_t old, uint64_t new, _Atomic uint64_t* p) {
   atomic_compare_exchange_strong_explicit(p, &old, new, memory_order_seq_cst, memory_order_seq_cst); return old;
 }
 
-uint8_t ghc___aarch64_swp1_relax(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_swp1_relax(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_swp1_relax(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_swp1_relax(uint8_t v, _Atomic uint8_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_relaxed);
 }
 
-uint8_t ghc___aarch64_swp1_acq(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_swp1_acq(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_swp1_acq(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_swp1_acq(uint8_t v, _Atomic uint8_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_acquire);
 }
 
-uint8_t ghc___aarch64_swp1_rel(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_swp1_rel(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_swp1_rel(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_swp1_rel(uint8_t v, _Atomic uint8_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_release);
 }
 
-uint8_t ghc___aarch64_swp1_acq_rel(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_swp1_acq_rel(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_swp1_acq_rel(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_swp1_acq_rel(uint8_t v, _Atomic uint8_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_acq_rel);
 }
 
-uint8_t ghc___aarch64_swp1_sync(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_swp1_sync(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_swp1_sync(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_swp1_sync(uint8_t v, _Atomic uint8_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_seq_cst);
 }
 
-uint16_t ghc___aarch64_swp2_relax(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_swp2_relax(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_swp2_relax(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_swp2_relax(uint16_t v, _Atomic uint16_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_relaxed);
 }
 
-uint16_t ghc___aarch64_swp2_acq(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_swp2_acq(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_swp2_acq(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_swp2_acq(uint16_t v, _Atomic uint16_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_acquire);
 }
 
-uint16_t ghc___aarch64_swp2_rel(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_swp2_rel(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_swp2_rel(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_swp2_rel(uint16_t v, _Atomic uint16_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_release);
 }
 
-uint16_t ghc___aarch64_swp2_acq_rel(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_swp2_acq_rel(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_swp2_acq_rel(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_swp2_acq_rel(uint16_t v, _Atomic uint16_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_acq_rel);
 }
 
-uint16_t ghc___aarch64_swp2_sync(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_swp2_sync(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_swp2_sync(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_swp2_sync(uint16_t v, _Atomic uint16_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_seq_cst);
 }
 
-uint32_t ghc___aarch64_swp4_relax(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_swp4_relax(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_swp4_relax(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_swp4_relax(uint32_t v, _Atomic uint32_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_relaxed);
 }
 
-uint32_t ghc___aarch64_swp4_acq(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_swp4_acq(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_swp4_acq(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_swp4_acq(uint32_t v, _Atomic uint32_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_acquire);
 }
 
-uint32_t ghc___aarch64_swp4_rel(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_swp4_rel(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_swp4_rel(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_swp4_rel(uint32_t v, _Atomic uint32_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_release);
 }
 
-uint32_t ghc___aarch64_swp4_acq_rel(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_swp4_acq_rel(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_swp4_acq_rel(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_swp4_acq_rel(uint32_t v, _Atomic uint32_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_acq_rel);
 }
 
-uint32_t ghc___aarch64_swp4_sync(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_swp4_sync(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_swp4_sync(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_swp4_sync(uint32_t v, _Atomic uint32_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_seq_cst);
 }
 
-uint64_t ghc___aarch64_swp8_relax(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_swp8_relax(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_swp8_relax(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_swp8_relax(uint64_t v, _Atomic uint64_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_relaxed);
 }
 
-uint64_t ghc___aarch64_swp8_acq(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_swp8_acq(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_swp8_acq(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_swp8_acq(uint64_t v, _Atomic uint64_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_acquire);
 }
 
-uint64_t ghc___aarch64_swp8_rel(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_swp8_rel(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_swp8_rel(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_swp8_rel(uint64_t v, _Atomic uint64_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_release);
 }
 
-uint64_t ghc___aarch64_swp8_acq_rel(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_swp8_acq_rel(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_swp8_acq_rel(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_swp8_acq_rel(uint64_t v, _Atomic uint64_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_acq_rel);
 }
 
-uint64_t ghc___aarch64_swp8_sync(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_swp8_sync(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_swp8_sync(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_swp8_sync(uint64_t v, _Atomic uint64_t* p) {
   return atomic_exchange_explicit(p, v, memory_order_seq_cst);
 }
 
-uint8_t ghc___aarch64_ldadd1_relax(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldadd1_relax(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldadd1_relax(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldadd1_relax(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_relaxed);
 }
 
-uint8_t ghc___aarch64_ldadd1_acq(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldadd1_acq(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldadd1_acq(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldadd1_acq(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_acquire);
 }
 
-uint8_t ghc___aarch64_ldadd1_rel(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldadd1_rel(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldadd1_rel(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldadd1_rel(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_release);
 }
 
-uint8_t ghc___aarch64_ldadd1_acq_rel(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldadd1_acq_rel(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldadd1_acq_rel(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldadd1_acq_rel(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_acq_rel);
 }
 
-uint8_t ghc___aarch64_ldadd1_sync(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldadd1_sync(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldadd1_sync(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldadd1_sync(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_seq_cst);
 }
 
-uint16_t ghc___aarch64_ldadd2_relax(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldadd2_relax(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldadd2_relax(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldadd2_relax(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_relaxed);
 }
 
-uint16_t ghc___aarch64_ldadd2_acq(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldadd2_acq(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldadd2_acq(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldadd2_acq(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_acquire);
 }
 
-uint16_t ghc___aarch64_ldadd2_rel(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldadd2_rel(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldadd2_rel(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldadd2_rel(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_release);
 }
 
-uint16_t ghc___aarch64_ldadd2_acq_rel(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldadd2_acq_rel(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldadd2_acq_rel(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldadd2_acq_rel(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_acq_rel);
 }
 
-uint16_t ghc___aarch64_ldadd2_sync(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldadd2_sync(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldadd2_sync(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldadd2_sync(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_seq_cst);
 }
 
-uint32_t ghc___aarch64_ldadd4_relax(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldadd4_relax(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldadd4_relax(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldadd4_relax(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_relaxed);
 }
 
-uint32_t ghc___aarch64_ldadd4_acq(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldadd4_acq(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldadd4_acq(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldadd4_acq(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_acquire);
 }
 
-uint32_t ghc___aarch64_ldadd4_rel(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldadd4_rel(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldadd4_rel(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldadd4_rel(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_release);
 }
 
-uint32_t ghc___aarch64_ldadd4_acq_rel(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldadd4_acq_rel(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldadd4_acq_rel(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldadd4_acq_rel(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_acq_rel);
 }
 
-uint32_t ghc___aarch64_ldadd4_sync(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldadd4_sync(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldadd4_sync(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldadd4_sync(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_seq_cst);
 }
 
-uint64_t ghc___aarch64_ldadd8_relax(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldadd8_relax(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldadd8_relax(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldadd8_relax(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_relaxed);
 }
 
-uint64_t ghc___aarch64_ldadd8_acq(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldadd8_acq(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldadd8_acq(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldadd8_acq(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_acquire);
 }
 
-uint64_t ghc___aarch64_ldadd8_rel(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldadd8_rel(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldadd8_rel(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldadd8_rel(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_release);
 }
 
-uint64_t ghc___aarch64_ldadd8_acq_rel(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldadd8_acq_rel(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldadd8_acq_rel(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldadd8_acq_rel(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_acq_rel);
 }
 
-uint64_t ghc___aarch64_ldadd8_sync(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldadd8_sync(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldadd8_sync(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldadd8_sync(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_add_explicit(p, v, memory_order_seq_cst);
 }
 
-uint8_t ghc___aarch64_ldclr1_relax(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldclr1_relax(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldclr1_relax(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldclr1_relax(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_relaxed);
 }
 
-uint8_t ghc___aarch64_ldclr1_acq(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldclr1_acq(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldclr1_acq(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldclr1_acq(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_acquire);
 }
 
-uint8_t ghc___aarch64_ldclr1_rel(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldclr1_rel(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldclr1_rel(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldclr1_rel(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_release);
 }
 
-uint8_t ghc___aarch64_ldclr1_acq_rel(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldclr1_acq_rel(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldclr1_acq_rel(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldclr1_acq_rel(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_acq_rel);
 }
 
-uint8_t ghc___aarch64_ldclr1_sync(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldclr1_sync(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldclr1_sync(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldclr1_sync(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_seq_cst);
 }
 
-uint16_t ghc___aarch64_ldclr2_relax(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldclr2_relax(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldclr2_relax(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldclr2_relax(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_relaxed);
 }
 
-uint16_t ghc___aarch64_ldclr2_acq(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldclr2_acq(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldclr2_acq(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldclr2_acq(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_acquire);
 }
 
-uint16_t ghc___aarch64_ldclr2_rel(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldclr2_rel(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldclr2_rel(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldclr2_rel(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_release);
 }
 
-uint16_t ghc___aarch64_ldclr2_acq_rel(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldclr2_acq_rel(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldclr2_acq_rel(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldclr2_acq_rel(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_acq_rel);
 }
 
-uint16_t ghc___aarch64_ldclr2_sync(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldclr2_sync(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldclr2_sync(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldclr2_sync(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_seq_cst);
 }
 
-uint32_t ghc___aarch64_ldclr4_relax(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldclr4_relax(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldclr4_relax(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldclr4_relax(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_relaxed);
 }
 
-uint32_t ghc___aarch64_ldclr4_acq(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldclr4_acq(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldclr4_acq(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldclr4_acq(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_acquire);
 }
 
-uint32_t ghc___aarch64_ldclr4_rel(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldclr4_rel(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldclr4_rel(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldclr4_rel(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_release);
 }
 
-uint32_t ghc___aarch64_ldclr4_acq_rel(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldclr4_acq_rel(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldclr4_acq_rel(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldclr4_acq_rel(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_acq_rel);
 }
 
-uint32_t ghc___aarch64_ldclr4_sync(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldclr4_sync(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldclr4_sync(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldclr4_sync(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_seq_cst);
 }
 
-uint64_t ghc___aarch64_ldclr8_relax(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldclr8_relax(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldclr8_relax(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldclr8_relax(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_relaxed);
 }
 
-uint64_t ghc___aarch64_ldclr8_acq(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldclr8_acq(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldclr8_acq(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldclr8_acq(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_acquire);
 }
 
-uint64_t ghc___aarch64_ldclr8_rel(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldclr8_rel(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldclr8_rel(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldclr8_rel(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_release);
 }
 
-uint64_t ghc___aarch64_ldclr8_acq_rel(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldclr8_acq_rel(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldclr8_acq_rel(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldclr8_acq_rel(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_acq_rel);
 }
 
-uint64_t ghc___aarch64_ldclr8_sync(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldclr8_sync(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldclr8_sync(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldclr8_sync(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_and_explicit(p, v, memory_order_seq_cst);
 }
 
-uint8_t ghc___aarch64_ldeor1_relax(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldeor1_relax(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldeor1_relax(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldeor1_relax(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_relaxed);
 }
 
-uint8_t ghc___aarch64_ldeor1_acq(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldeor1_acq(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldeor1_acq(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldeor1_acq(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_acquire);
 }
 
-uint8_t ghc___aarch64_ldeor1_rel(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldeor1_rel(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldeor1_rel(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldeor1_rel(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_release);
 }
 
-uint8_t ghc___aarch64_ldeor1_acq_rel(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldeor1_acq_rel(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldeor1_acq_rel(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldeor1_acq_rel(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_acq_rel);
 }
 
-uint8_t ghc___aarch64_ldeor1_sync(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldeor1_sync(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldeor1_sync(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldeor1_sync(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_seq_cst);
 }
 
-uint16_t ghc___aarch64_ldeor2_relax(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldeor2_relax(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldeor2_relax(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldeor2_relax(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_relaxed);
 }
 
-uint16_t ghc___aarch64_ldeor2_acq(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldeor2_acq(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldeor2_acq(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldeor2_acq(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_acquire);
 }
 
-uint16_t ghc___aarch64_ldeor2_rel(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldeor2_rel(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldeor2_rel(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldeor2_rel(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_release);
 }
 
-uint16_t ghc___aarch64_ldeor2_acq_rel(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldeor2_acq_rel(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldeor2_acq_rel(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldeor2_acq_rel(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_acq_rel);
 }
 
-uint16_t ghc___aarch64_ldeor2_sync(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldeor2_sync(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldeor2_sync(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldeor2_sync(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_seq_cst);
 }
 
-uint32_t ghc___aarch64_ldeor4_relax(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldeor4_relax(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldeor4_relax(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldeor4_relax(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_relaxed);
 }
 
-uint32_t ghc___aarch64_ldeor4_acq(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldeor4_acq(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldeor4_acq(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldeor4_acq(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_acquire);
 }
 
-uint32_t ghc___aarch64_ldeor4_rel(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldeor4_rel(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldeor4_rel(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldeor4_rel(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_release);
 }
 
-uint32_t ghc___aarch64_ldeor4_acq_rel(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldeor4_acq_rel(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldeor4_acq_rel(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldeor4_acq_rel(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_acq_rel);
 }
 
-uint32_t ghc___aarch64_ldeor4_sync(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldeor4_sync(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldeor4_sync(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldeor4_sync(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_seq_cst);
 }
 
-uint64_t ghc___aarch64_ldeor8_relax(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldeor8_relax(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldeor8_relax(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldeor8_relax(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_relaxed);
 }
 
-uint64_t ghc___aarch64_ldeor8_acq(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldeor8_acq(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldeor8_acq(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldeor8_acq(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_acquire);
 }
 
-uint64_t ghc___aarch64_ldeor8_rel(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldeor8_rel(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldeor8_rel(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldeor8_rel(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_release);
 }
 
-uint64_t ghc___aarch64_ldeor8_acq_rel(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldeor8_acq_rel(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldeor8_acq_rel(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldeor8_acq_rel(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_acq_rel);
 }
 
-uint64_t ghc___aarch64_ldeor8_sync(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldeor8_sync(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldeor8_sync(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldeor8_sync(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_xor_explicit(p, v, memory_order_seq_cst);
 }
 
-uint8_t ghc___aarch64_ldset1_relax(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldset1_relax(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldset1_relax(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldset1_relax(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_relaxed);
 }
 
-uint8_t ghc___aarch64_ldset1_acq(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldset1_acq(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldset1_acq(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldset1_acq(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_acquire);
 }
 
-uint8_t ghc___aarch64_ldset1_rel(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldset1_rel(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldset1_rel(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldset1_rel(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_release);
 }
 
-uint8_t ghc___aarch64_ldset1_acq_rel(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldset1_acq_rel(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldset1_acq_rel(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldset1_acq_rel(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_acq_rel);
 }
 
-uint8_t ghc___aarch64_ldset1_sync(uint8_t v, uint8_t* p);
-uint8_t ghc___aarch64_ldset1_sync(uint8_t v, uint8_t* p) {
+uint8_t ghc___aarch64_ldset1_sync(uint8_t v, _Atomic uint8_t* p);
+uint8_t ghc___aarch64_ldset1_sync(uint8_t v, _Atomic uint8_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_seq_cst);
 }
 
-uint16_t ghc___aarch64_ldset2_relax(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldset2_relax(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldset2_relax(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldset2_relax(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_relaxed);
 }
 
-uint16_t ghc___aarch64_ldset2_acq(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldset2_acq(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldset2_acq(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldset2_acq(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_acquire);
 }
 
-uint16_t ghc___aarch64_ldset2_rel(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldset2_rel(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldset2_rel(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldset2_rel(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_release);
 }
 
-uint16_t ghc___aarch64_ldset2_acq_rel(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldset2_acq_rel(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldset2_acq_rel(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldset2_acq_rel(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_acq_rel);
 }
 
-uint16_t ghc___aarch64_ldset2_sync(uint16_t v, uint16_t* p);
-uint16_t ghc___aarch64_ldset2_sync(uint16_t v, uint16_t* p) {
+uint16_t ghc___aarch64_ldset2_sync(uint16_t v, _Atomic uint16_t* p);
+uint16_t ghc___aarch64_ldset2_sync(uint16_t v, _Atomic uint16_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_seq_cst);
 }
 
-uint32_t ghc___aarch64_ldset4_relax(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldset4_relax(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldset4_relax(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldset4_relax(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_relaxed);
 }
 
-uint32_t ghc___aarch64_ldset4_acq(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldset4_acq(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldset4_acq(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldset4_acq(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_acquire);
 }
 
-uint32_t ghc___aarch64_ldset4_rel(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldset4_rel(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldset4_rel(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldset4_rel(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_release);
 }
 
-uint32_t ghc___aarch64_ldset4_acq_rel(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldset4_acq_rel(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldset4_acq_rel(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldset4_acq_rel(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_acq_rel);
 }
 
-uint32_t ghc___aarch64_ldset4_sync(uint32_t v, uint32_t* p);
-uint32_t ghc___aarch64_ldset4_sync(uint32_t v, uint32_t* p) {
+uint32_t ghc___aarch64_ldset4_sync(uint32_t v, _Atomic uint32_t* p);
+uint32_t ghc___aarch64_ldset4_sync(uint32_t v, _Atomic uint32_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_seq_cst);
 }
 
-uint64_t ghc___aarch64_ldset8_relax(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldset8_relax(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldset8_relax(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldset8_relax(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_relaxed);
 }
 
-uint64_t ghc___aarch64_ldset8_acq(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldset8_acq(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldset8_acq(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldset8_acq(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_acquire);
 }
 
-uint64_t ghc___aarch64_ldset8_rel(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldset8_rel(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldset8_rel(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldset8_rel(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_release);
 }
 
-uint64_t ghc___aarch64_ldset8_acq_rel(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldset8_acq_rel(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldset8_acq_rel(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldset8_acq_rel(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_acq_rel);
 }
 
-uint64_t ghc___aarch64_ldset8_sync(uint64_t v, uint64_t* p);
-uint64_t ghc___aarch64_ldset8_sync(uint64_t v, uint64_t* p) {
+uint64_t ghc___aarch64_ldset8_sync(uint64_t v, _Atomic uint64_t* p);
+uint64_t ghc___aarch64_ldset8_sync(uint64_t v, _Atomic uint64_t* p) {
   return atomic_fetch_or_explicit(p, v, memory_order_seq_cst);
 }
 


=====================================
testsuite/tests/codeGen/should_run/T24507.hs deleted
=====================================
@@ -1,15 +0,0 @@
-{-# LANGUAGE MagicHash #-}
-{-# LANGUAGE UnboxedTuples #-}
-{-# LANGUAGE GHCForeignImportPrim #-}
-{-# LANGUAGE UnliftedFFITypes #-}
-
-module Main where
-
-import GHC.Exts
-
-foreign import prim "foo" foo :: Int# -> Int#
-
-main = do
-
-    let f x = case x of I# x' -> case foo x' of x -> print (I# x)
-    mapM_ f [1..7]
\ No newline at end of file


=====================================
testsuite/tests/codeGen/should_run/T24507.stdout deleted
=====================================
@@ -1,7 +0,0 @@
-1
-2
-2
-2
-2
-2
-2


=====================================
testsuite/tests/codeGen/should_run/T24507_cmm.cmm deleted
=====================================
@@ -1,35 +0,0 @@
-#include "Cmm.h"
-
-bar() {
-    return (2);
-}
-
-foo(W_ x) {
-
-    switch(x) {
-        case 1: goto a;
-        case 2: goto b;
-        case 3: goto c;
-        case 4: goto d;
-        case 5: goto e;
-        case 6: goto f;
-        case 7: goto g;
-    }
-    return (1);
-
-    a:
-    return (1);
-    b:
-    jump bar();
-    c:
-    jump bar();
-    d:
-    jump bar();
-    e:
-    jump bar();
-    f:
-    jump bar();
-    g:
-    jump bar();
-
-}


=====================================
testsuite/tests/codeGen/should_run/all.T
=====================================
@@ -243,6 +243,3 @@ test('MulMayOflo_full',
 test('T24264run', normal, compile_and_run, [''])
 test('T24295a', normal, compile_and_run, ['-O -floopification'])
 test('T24295b', normal, compile_and_run, ['-O -floopification -fpedantic-bottoms'])
-
-test('T24507', [req_cmm], multi_compile_and_run,
-                 ['T24507', [('T24507_cmm.cmm', '')], '-O2'])


=====================================
testsuite/tests/perf/should_run/T21741.hs
=====================================
@@ -0,0 +1,25 @@
+{-# LANGUAGE BangPatterns #-}
+module Main where
+
+import System.Environment
+
+f :: [Int] -> Int
+f xs = g (length xs) (even $ mySum xs)
+{-# NOINLINE f #-}
+
+g :: Int -> Bool -> Int
+g 0 _ = 0
+g n !b = length xs + mySum xs + if b then 0 else 1
+  where
+    xs = [0..n]
+{-# NOINLINE g #-}
+
+mySum :: [Int] -> Int
+mySum = go 0
+  where
+    go acc (x:xs) = go (x+acc) xs
+    go acc _      = acc
+
+main = do
+  (n:_) <- map read <$> getArgs
+  print $ f [0..n]


=====================================
testsuite/tests/perf/should_run/T21741.stdout
=====================================
@@ -0,0 +1 @@
+50000025000003


=====================================
testsuite/tests/perf/should_run/all.T
=====================================
@@ -413,3 +413,9 @@ test('T21839r',
 # perf doesn't regress further, so it is not marked as such.
 test('T18964', [collect_stats('bytes allocated', 1), only_ways(['normal'])], compile_and_run, ['-O'])
 test('T23021', [collect_stats('bytes allocated', 1), only_ways(['normal'])], compile_and_run, ['-O2'])
+
+# GC bytes copied goes up a lot if the space leak returns
+test('T21741', [collect_stats('copied_bytes', 2),
+                extra_run_opts('10000000'),
+                only_ways(['normal'])],
+      compile_and_run, ['-O'])


=====================================
testsuite/tests/simplCore/should_compile/T24251a.hs
=====================================
@@ -0,0 +1,9 @@
+module T24251a where
+
+f xs = xs `seq`
+       (let t = reverse (reverse (reverse (reverse xs))) in
+        case xs of
+          [] ->  (t,True)
+          (_:_) -> (t,False))
+
+-- We start with an eval of xs, but that should disappear.


=====================================
testsuite/tests/simplCore/should_compile/T24251a.stderr
=====================================
@@ -0,0 +1,18 @@
+
+==================== Tidy Core ====================
+Result size of Tidy Core
+  = {terms: 33, types: 60, coercions: 0, joins: 0/1}
+
+$wf
+  = \ @a xs ->
+      let {
+        t = reverse1 (reverse1 (reverse1 (reverse1 xs []) []) []) [] } in
+      case xs of {
+        [] -> (# t, True #);
+        : ds ds1 -> (# t, False #)
+      }
+
+f = \ @a xs -> case $wf xs of { (# ww, ww1 #) -> (ww, ww1) }
+
+
+


=====================================
testsuite/tests/simplCore/should_compile/all.T
=====================================
@@ -515,3 +515,4 @@ test('T24229a', [ grep_errmsg(r'wfoo') ], compile, ['-O2 -ddump-simpl -dno-typea
 test('T24229b', [ grep_errmsg(r'wfoo') ], compile, ['-O2 -ddump-simpl -dno-typeable-binds -dsuppress-all -dsuppress-uniques -dppr-cols=99999'])
 test('T24370', normal, compile, ['-O'])
 test('T24551', normal, compile, ['-O -dcore-lint'])
+test('T24251a', normal, compile, ['-O -ddump-simpl -dsuppress-uniques -dno-typeable-binds -dsuppress-all'])



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/b2a487e4f6c46069441eed762deae4732203e335...c5424f6671c3780ade753fd2b09a232f2eca1be3

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/b2a487e4f6c46069441eed762deae4732203e335...c5424f6671c3780ade753fd2b09a232f2eca1be3
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240327/e2839b41/attachment-0001.html>


More information about the ghc-commits mailing list