[Git][ghc/ghc][wip/andreask/9.10-backports] 20 commits: Revert "AArch64: Implement switch/jump tables (#19912)"

Fri Feb 14 12:38:25 UTC 2025

Andreas Klebinger pushed to branch wip/andreask/9.10-backports at Glasgow Haskell Compiler / GHC


Commits:
420e9244 by Andreas Klebinger at 2025-02-14T13:10:44+01:00
Revert "AArch64: Implement switch/jump tables (#19912)"

This reverts commit 9b4326cf4180eb70f7474aa143b46a49cb69306f.

- - - - -
8871bb89 by Ben Gamari at 2025-02-14T13:10:44+01:00
rts/linker: Fix out-of-bounds mapping logic

Previously the structure of `mmapInRegion` concealed a subtle bug
concerning handling of `mmap` returning mappings below the beginning of
the desired region. Specifically, we would reset `p = result + bytes`
and then again reset `p = region->start` before looping around for
another iteration. This resulted in an infinite loop on FreeBSD.

Fixes #25492.

(cherry picked from commit 292ed74ea908b64490e91346b890cbebdcde37d0)

- - - - -
663c1711 by Ben Gamari at 2025-02-14T13:10:44+01:00
rts/linker: Clarify debug output

(cherry picked from commit 20912f5bac6fe4146172accc1849d9b762eb45e3)

- - - - -
9b2e13db by Ben Gamari at 2025-02-14T13:10:44+01:00
hadrian: Mitigate mktexfmt race

At least some versions of Texlive's `mktexfmt` utility cannot be invoked
concurrently in their initial run since they fail to handle failure of
`mkdir` due to racing. Specifically, we see

```
| Run Xelatex: users_guide.tex => /tmp/extra-dir-9616886274866
| Run Xelatex: Haddock.tex => /tmp/extra-dir-9616886274869
This is XeTeX, Version 3.14159265-2.6-0.999992 (TeX Live 2020) (preloaded format=xelatex)
 restricted \write18 enabled.
kpathsea: Running mktexfmt xelatex.fmt
mktexfmt: mktexfmt is using the following fmtutil.cnf files (in precedence order):
mktexfmt:   /usr/share/texlive/texmf-dist/web2c/fmtutil.cnf
mktexfmt: mktexfmt is using the following fmtutil.cnf file for writing changes:
mktexfmt:   /builds/ghc/ghc/tmp-home/.texlive2020/texmf-config/web2c/fmtutil.cnf
/usr/bin/mktexfmt: mkdir(/builds/ghc/ghc/tmp-home/.texlive2020/texmf-var/web2c/) failed for tree /builds/ghc/ghc/tmp-home/.texlive2020/texmf-var/web2c: File exists at /usr/share/texlive/tlpkg/TeXLive/TLUtils.pm line 937.
I can't find the format file `xelatex.fmt'!
```

That is two `mktexfmt` invocations (for the user's guide and haddock
builds) attempted to create `$HOME/texlive2020/texmf-var/web2c` and
raced. One of the two `mkdir`'s consequently failed, bringing down the
entire build.

We avoid this by ensuring that the first `xelatex` invocation is always
performed serially.

Fixes #25564.

(cherry picked from commit 41dae5b86955094aa4c5647f63f1f52f1a8a5519)

- - - - -
4b00e0dc by Ben Gamari at 2025-02-14T13:10:44+01:00
rts/CheckUnload: Reset old_objects if unload is skipped

Previously `checkUnload` failed to reset `old_objects` when it decided
not to unload (e.g. due to heap profiling being enabled).

Fixes #24935.

(cherry picked from commit 9efbc51f99118e8f9c3abf2bcb6dc3295893ded6)

- - - - -
653b86ae by Ben Gamari at 2025-02-14T13:10:44+01:00
rts(setNumCapabilities): Assert that n_caps < MAX_N_CAPS

It was noticed in #25560 that this would previously be allowed,
resulting in a segfault.

I will add a proper exception in `base` in a future commit.

(cherry picked from commit f08a72eb484193934c56e34366b277f4d7247a24)

- - - - -
1e7ef0a1 by Ben Gamari at 2025-02-14T13:10:44+01:00
ghc-internal: Fix inconsistent FFI import types

The foreign imports of `enabled_capabilities` and
`getNumberOfProcessors` were declared as `CInt` whereas they are defined
as `uint32_t`.

(cherry picked from commit e10d31ad849b5f7c1f052e7c93f7c7aaf85918c9)

- - - - -
e3b3358f by Ben Gamari at 2025-02-14T13:10:44+01:00
rts: Mention maximum capability count in users guide

Addresses #25560.

(cherry picked from commit 06265655bfe6b48cde6923a933d81c9889a443a3)

- - - - -
9229aef9 by Ben Gamari at 2025-02-14T13:10:44+01:00
rts/Capability: Move induction variable declaration into `for`s

Just a stylistic change.

(cherry picked from commit d488470ba302760cfd2f3515d9338d1d75f84dd5)

- - - - -
a1791350 by Ben Gamari at 2025-02-14T13:10:44+01:00
rts: Determine max_n_capabilities at RTS startup

Previously the maximum number of capabilities supported by the RTS was
statically capped at 256. However, this bound is uncomfortably low given
the size of today's machine.

While supporting unbounded, fully-dynamic adjustment would be nice, it
is complex and so instead we do something simpler: Probe the logical
core count at RTS startup and use this as the static bound for the rest
of our execution.

This should avoid users running into the capability limit on large
machines while avoiding wasting memory on a large capabilities array for
most users and keeping complexity at bay.

Addresses #25560.

(cherry picked from commit 71f050b74eaa2fdc2ca5da53f85497ac94ab6a2a)

- - - - -
b8efe316 by Ben Gamari at 2025-02-14T13:10:44+01:00
testsuite: Introduce req_c_rts

As suggested by @hsyl20, this is intended to mark tests that rely on the
behavior of the C RTS.

(cherry picked from commit 1e84b41108d96cb721dd11281105fdf621105a12)

- - - - -
4a48042c by Ben Gamari at 2025-02-14T13:10:44+01:00
testsuite: Add test for #25560

(cherry picked from commit 683115a40fd989a287fa51efe140af9448526098)

- - - - -
ae36c9e7 by Ben Gamari at 2025-02-14T13:10:44+01:00
rts/CheckUnload: Don't prepare to unload if we can't unload

Previously `prepareUnloadCheck` would move the `objects` list to
`old_objects` even when profiling (where we cannot unload). This caused
us to vacate the `objects` list during major GCs, losing track of loaded
objects. Fix this by ensuring that `prepareUnloadCheck` and
`checkUnload` both use the same short-cutting logic.

(cherry picked from commit 34d3e8e69b62b92cc438514f7fb8e37ce639efea)

- - - - -
8c363c54 by Matthew Pickering at 2025-02-14T13:10:44+01:00
typechecker: Perform type family consistency checks in topological order

Consider a module M importing modules A, B and C.

We can waste a lot of work depending on the order that the modules are
checked for family consistency.

Consider that C imports A and B. When compiling C we must have already
checked A and B for consistency, therefore if C is processed first then
A and B will not need to be checked for consistency again.

If A and B are compared first, then the consistency checks will be
performed against (wasted as we already performed them for C).

At the moment the order which modules are checked is non-deterministic.

Clearly we should engineer that C is checked before B and A, but by what
scheme?

A simple one is to observe that if a module M is in the transitive
closure of X then the size of the consistent family set of M is less
than or equal to size of the consistent family set of X.

Therefore by sorting the imports by the size of the consistent family
set and processing the largest first, you make sure to process modules
in topological order.

In practice we have observed that this strategy has reduced the amount
of consistency checks performed.

One solution to #25554

(cherry picked from commit 13fe48d40004d9cdf3c73300a18f144bdc5191d9)

- - - - -
a31ca08b by Ben Gamari at 2025-02-14T13:10:44+01:00
rts: Fix incorrect format specifiers in era profiling

Fixes #25581.

(cherry picked from commit 430d965a176d6c9e629d169fa0606923275c8332)

- - - - -
045ff13b by Cheng Shao at 2025-02-14T13:10:44+01:00
ci: minor nix-in-docker improvements

This patch makes some minor improvements re nix-in-docker logic in the
ci configuration:

- Update `nixos/nix` to the latest version
- Apply $CPUS to `cores`/`max-jobs` to avoid oversubscribing while
  allowing a reasonable degree of parallelism
- Remove redundant `--extra-experimental-features nix-command` in
  later `nix shell` invocations, it's already configured in
  `/etc/nix/nix.conf`

(cherry picked from commit 291388e18b5b61a02a43cc153a642fd67d6127c4)

- - - - -
852c738e by Zubin Duggal at 2025-02-14T13:10:44+01:00
ghcup metadata: output metadata fragment in CI

(cherry picked from commit 52b58a660e735b20961d792d8fa9267f01247a50)
(cherry picked from commit 04433916cdedab80114cbed1dc399ae816bada91)

- - - - -
1f594b60 by Zubin Duggal at 2025-02-14T13:10:44+01:00
ghcup metatdata: use fedora33 for redhat

Redhat 9 doesn't have libtinfo.so.5 anymore

(cherry picked from commit dc86785eb43afd1bd292287c064fb5ad94fe8c7f)
(cherry picked from commit 7c78804e3b25d2c0246cd1f3ce0d4015431e5831)

- - - - -
98d0ec4e by Zubin Duggal at 2025-02-14T13:10:44+01:00
ghcup metadata: still use centos for redhat <9

(cherry picked from commit 1d72cfb2c1054bc8a399855d5c68443c969d2f66)

- - - - -
29e25ab0 by Simon Peyton Jones at 2025-02-14T13:10:44+01:00
Preserve orientation when unifying kinds

This MR fixes yet another manifestation of the trickiness caused
by Note [Fundeps with instances, and equality orientation].

I wish there was a more robust way to do this, but this fix is
a definite improvement.

Fixes #25597

(cherry picked from commit 63d63f9d3e074c3bcf5b526ee14e50cb2b6b9d70)

- - - - -


30 changed files:

- .gitlab-ci.yml
- .gitlab/rel_eng/mk-ghcup-metadata/mk_ghcup_metadata.py
- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/AArch64/Instr.hs
- compiler/GHC/CmmToAsm/AArch64/Ppr.hs
- compiler/GHC/Tc/Instance/Family.hs
- compiler/GHC/Tc/Module.hs
- compiler/GHC/Tc/Solver/Equality.hs
- compiler/GHC/Tc/Utils/Unify.hs
- docs/users_guide/using-concurrent.rst
- hadrian/src/Rules/Documentation.hs
- libraries/ghc-internal/src/GHC/Internal/Conc/Sync.hs
- rts/Capability.c
- rts/Capability.h
- rts/CheckUnload.c
- rts/Linker.c
- rts/ProfHeap.c
- rts/RtsSymbols.c
- rts/Schedule.c
- rts/include/rts/Config.h
- rts/include/rts/Threads.h
- rts/linker/MMap.c
- testsuite/driver/testlib.py
- testsuite/tests/rep-poly/RepPolyMcGuard.stderr
- testsuite/tests/rep-poly/RepPolyRecordUpdate.stderr
- testsuite/tests/rep-poly/T12709.stderr
- + testsuite/tests/rts/T25560.hs
- testsuite/tests/rts/all.T
- + testsuite/tests/typecheck/should_compile/T25597.hs
- testsuite/tests/typecheck/should_compile/all.T


Changes:

=====================================
.gitlab-ci.yml
=====================================
@@ -289,7 +289,7 @@ lint-author:
     - *drafts-can-fail-lint
 
 lint-ci-config:
-  image: nixos/nix:2.14.1
+  image: nixos/nix:2.25.2
   extends: .lint
   # We don't need history/submodules in this job
   variables:
@@ -297,6 +297,18 @@ lint-ci-config:
     GIT_SUBMODULE_STRATEGY: none
   before_script:
     - echo "experimental-features = nix-command flakes" >> /etc/nix/nix.conf
+    # Note [Nix-in-Docker]
+    # ~~~~~~~~~~~~~~~~~~~~
+    # The nixos/nix default config is max-jobs=1 and cores=$(logical
+    # cores num) which doesn't play nice with our $CPUS convention. We
+    # fix it before invoking any nix build to avoid oversubscribing
+    # while allowing a reasonable degree of parallelism.
+    # FIXME: Disabling build-users-group=nixbld is a workaround for a Nix-in-Docker issue. See
+    # https://gitlab.haskell.org/ghc/head.hackage/-/issues/38#note_560487 for
+    # discussion.
+    - echo "cores = $CPUS" >> /etc/nix/nix.conf
+    - echo "max-jobs = $CPUS" >> /etc/nix/nix.conf
+    - nix run nixpkgs#gnused -- -i -e 's/ nixbld//' /etc/nix/nix.conf
   script:
     - nix run .gitlab/generate-ci#generate-jobs
     # 1 if .gitlab/generate_jobs changed the output of the generated config
@@ -1096,7 +1108,7 @@ project-version:
 
 .ghcup-metadata:
   stage: deploy
-  image: nixos/nix:2.14.1
+  image: nixos/nix:2.25.2
   dependencies: null
   tags:
     - x86_64-linux
@@ -1105,6 +1117,10 @@ project-version:
     GIT_SUBMODULE_STRATEGY: "none"
   before_script:
     - echo "experimental-features = nix-command flakes" >> /etc/nix/nix.conf
+    # FIXME: See Note [Nix-in-Docker]
+    - echo "cores = $CPUS" >> /etc/nix/nix.conf
+    - echo "max-jobs = $CPUS" >> /etc/nix/nix.conf
+    - nix run nixpkgs#gnused -- -i -e 's/ nixbld//' /etc/nix/nix.conf
     - nix-channel --update
     - cat version.sh
     # Calculate the project version
@@ -1157,7 +1173,7 @@ ghcup-metadata-nightly:
       artifacts: false
     - job: project-version
   script:
-    - nix shell --extra-experimental-features nix-command -f .gitlab/rel_eng -c ghcup-metadata --metadata ghcup-0.0.7.yaml --date="$(date -d $CI_PIPELINE_CREATED_AT +%Y-%m-%d)" --pipeline-id="$CI_PIPELINE_ID" --version="$ProjectVersion" > "metadata_test.yaml"
+    - nix shell -f .gitlab/rel_eng -c ghcup-metadata --metadata ghcup-0.0.7.yaml --date="$(date -d $CI_PIPELINE_CREATED_AT +%Y-%m-%d)" --pipeline-id="$CI_PIPELINE_ID" --version="$ProjectVersion" > "metadata_test.yaml"
   rules:
     - if: $NIGHTLY
 
@@ -1195,7 +1211,8 @@ ghcup-metadata-release:
   # No explicit needs for release pipeline as we assume we need everything and everything will pass.
   extends: .ghcup-metadata
   script:
-    - nix shell --extra-experimental-features nix-command -f .gitlab/rel_eng -c ghcup-metadata --release-mode --metadata ghcup-0.0.7.yaml --date="$(date -d $CI_PIPELINE_CREATED_AT +%Y-%m-%d)" --pipeline-id="$CI_PIPELINE_ID" --version="$ProjectVersion" > "metadata_test.yaml"
+    - nix shell -f .gitlab/rel_eng -c ghcup-metadata --release-mode --metadata ghcup-0.0.7.yaml --date="$(date -d $CI_PIPELINE_CREATED_AT +%Y-%m-%d)" --pipeline-id="$CI_PIPELINE_ID" --version="$ProjectVersion" --fragment
+    - nix shell -f .gitlab/rel_eng -c ghcup-metadata --release-mode --metadata ghcup-0.0.7.yaml --date="$(date -d $CI_PIPELINE_CREATED_AT +%Y-%m-%d)" --pipeline-id="$CI_PIPELINE_ID" --version="$ProjectVersion" > "metadata_test.yaml"
   rules:
     - if: '$RELEASE_JOB == "yes"'
 


=====================================
.gitlab/rel_eng/mk-ghcup-metadata/mk_ghcup_metadata.py
=====================================
@@ -232,7 +232,9 @@ def mk_new_yaml(release_mode, version, date, pipeline_type, job_map):
                               , "unknown_versioning" : centos7  }
           , "Linux_Fedora"  : { ">= 33": fedora33
                               , "unknown_versioning": centos7 }
-          , "Linux_RedHat"  : { "unknown_versioning": centos7 }
+          , "Linux_RedHat"  : { "< 9": centos7
+                              , ">= 9": fedora33
+                              , "unknown_versioning": fedora33 }
           , "Linux_UnknownLinux" : { "unknown_versioning": rocky8 }
           , "Darwin" : { "unknown_versioning" : darwin_x86 }
           , "Windows" : { "unknown_versioning" :  windows }


=====================================
compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
=====================================
@@ -23,7 +23,7 @@ import GHC.Cmm.DebugBlock
 import GHC.CmmToAsm.Monad
    ( NatM, getNewRegNat
    , getPicBaseMaybeNat, getPlatform, getConfig
-   , getDebugBlock, getFileId, getNewLabelNat
+   , getDebugBlock, getFileId
    )
 -- import GHC.CmmToAsm.Instr
 import GHC.CmmToAsm.PIC
@@ -50,7 +50,7 @@ import GHC.Types.Unique.Supply
 import GHC.Data.OrdList
 import GHC.Utils.Outputable
 
-import Control.Monad    ( mapAndUnzipM )
+import Control.Monad    ( mapAndUnzipM, foldM )
 import Data.Maybe
 import GHC.Float
 
@@ -210,79 +210,43 @@ annExpr e instr {- debugIsOn -} = ANN (text . show $ e) instr
 -- -----------------------------------------------------------------------------
 -- Generating a table-branch

--- | Generate jump to jump table target
+-- TODO jump tables would be a lot faster, but we'll use bare bones for now.
+-- this is usually done by sticking the jump table ids into an instruction
+-- and then have the @generateJumpTableForInstr@ callback produce the jump
+-- table as a static.
 --
--- The index into the jump table is calulated by evaluating @expr at . The
--- corresponding table entry contains the relative address to jump to (relative
--- to the jump table's first entry / the table's own label).
-genSwitch :: NCGConfig -> CmmExpr -> SwitchTargets -> NatM InstrBlock
-genSwitch config expr targets = do
-  (reg, fmt1, e_code) <- getSomeReg indexExpr
-  let fmt = II64
-  targetReg <- getNewRegNat fmt
-  lbl <- getNewLabelNat
-  dynRef <- cmmMakeDynamicReference config DataReference lbl
-  (tableReg, fmt2, t_code) <- getSomeReg dynRef
-  let code =
-        toOL
-          [ COMMENT (text "indexExpr" <+> (text . show) indexExpr),
-            COMMENT (text "dynRef" <+> (text . show) dynRef)
-          ]
-          `appOL` e_code
-          `appOL` t_code
-          `appOL` toOL
-            [ COMMENT (ftext "Jump table for switch"),
-              -- index to offset into the table (relative to tableReg)
-              annExpr expr (LSL (OpReg (formatToWidth fmt1) reg) (OpReg (formatToWidth fmt1) reg) (OpImm (ImmInt 3))),
-              -- calculate table entry address
-              ADD (OpReg W64 targetReg) (OpReg (formatToWidth fmt1) reg) (OpReg (formatToWidth fmt2) tableReg),
-              -- load table entry (relative offset from tableReg (first entry) to target label)
-              LDR II64 (OpReg W64 targetReg) (OpAddr (AddrRegImm targetReg (ImmInt 0))),
-              -- calculate absolute address of the target label
-              ADD (OpReg W64 targetReg) (OpReg W64 targetReg) (OpReg W64 tableReg),
-              -- prepare jump to target label
-              J_TBL ids (Just lbl) targetReg
-            ]
-  return code
-  where
-    -- See Note [Sub-word subtlety during jump-table indexing] in
-    -- GHC.CmmToAsm.X86.CodeGen for why we must first offset, then widen.
-    indexExpr0 = cmmOffset platform expr offset
-    -- We widen to a native-width register to sanitize the high bits
-    indexExpr =
-      CmmMachOp
-        (MO_UU_Conv expr_w (platformWordWidth platform))
-        [indexExpr0]
-    expr_w = cmmExprWidth platform expr
-    (offset, ids) = switchTargetsToTable targets
-    platform = ncgPlatform config
-
--- | Generate jump table data (if required)
+-- See Ticket 19912
+--
+-- data SwitchTargets =
+--    SwitchTargets
+--        Bool                       -- Signed values
+--        (Integer, Integer)         -- Range
+--        (Maybe Label)              -- Default value
+--        (M.Map Integer Label)      -- The branches
 --
--- The idea is to emit one table entry per case. The entry is the relative
--- address of the block to jump to (relative to the table's first entry /
--- table's own label.) The calculation itself is done by the linker.
-generateJumpTableForInstr ::
-  NCGConfig ->
-  Instr ->
-  Maybe (NatCmmDecl RawCmmStatics Instr)
-generateJumpTableForInstr config (J_TBL ids (Just lbl) _) =
-  let jumpTable =
-        map jumpTableEntryRel ids
-        where
-          jumpTableEntryRel Nothing =
-            CmmStaticLit (CmmInt 0 (ncgWordWidth config))
-          jumpTableEntryRel (Just blockid) =
-            CmmStaticLit
-              ( CmmLabelDiffOff
-                  blockLabel
-                  lbl
-                  0
-                  (ncgWordWidth config)
-              )
-            where
-              blockLabel = blockLbl blockid
-   in Just (CmmData (Section ReadOnlyData lbl) (CmmStaticsRaw lbl jumpTable))
+-- Non Jumptable plan:
+-- xE <- expr
+--
+genSwitch :: CmmExpr -> SwitchTargets -> NatM InstrBlock
+genSwitch expr targets = do -- pprPanic "genSwitch" (ppr expr)
+  (reg, format, code) <- getSomeReg expr
+  let w = formatToWidth format
+  let mkbranch acc (key, bid) = do
+        (keyReg, _format, code) <- getSomeReg (CmmLit (CmmInt key w))
+        return $ code `appOL`
+                 toOL [ CMP (OpReg w reg) (OpReg w keyReg)
+                      , BCOND EQ (TBlock bid)
+                      ] `appOL` acc
+      def_code = case switchTargetsDefault targets of
+        Just bid -> unitOL (B (TBlock bid))
+        Nothing  -> nilOL
+
+  switch_code <- foldM mkbranch nilOL (switchTargetsCases targets)
+  return $ code `appOL` switch_code `appOL` def_code
+
+-- We don't do jump tables for now, see Ticket 19912
+generateJumpTableForInstr :: NCGConfig -> Instr
+  -> Maybe (NatCmmDecl RawCmmStatics Instr)
 generateJumpTableForInstr _ _ = Nothing
 
 -- -----------------------------------------------------------------------------
@@ -314,7 +278,6 @@ stmtToInstrs :: BlockId -- ^ Basic block this statement will start to be placed
 stmtToInstrs bid stmt = do
   -- traceM $ "-- -------------------------- stmtToInstrs -------------------------- --\n"
   --     ++ showSDocUnsafe (ppr stmt)
-  config <- getConfig
   platform <- getPlatform
   case stmt of
     CmmUnsafeForeignCall target result_regs args
@@ -343,7 +306,7 @@ stmtToInstrs bid stmt = do
       CmmCondBranch arg true false _prediction ->
           genCondBranch bid true false arg
 
-      CmmSwitch arg ids -> genSwitch config arg ids
+      CmmSwitch arg ids -> genSwitch arg ids
 
       CmmCall { cml_target = arg } -> genJump arg
 
@@ -388,6 +351,12 @@ getRegisterReg platform (CmmGlobal reg@(GlobalRegUse mid _))
         -- ones which map to a real machine register on this
         -- platform.  Hence if it's not mapped to a registers something
         -- went wrong earlier in the pipeline.
+-- | Convert a BlockId to some CmmStatic data
+-- TODO: Add JumpTable Logic, see Ticket 19912
+-- jumpTableEntry :: NCGConfig -> Maybe BlockId -> CmmStatic
+-- jumpTableEntry config Nothing   = CmmStaticLit (CmmInt 0 (ncgWordWidth config))
+-- jumpTableEntry _ (Just blockid) = CmmStaticLit (CmmLabel blockLabel)
+--     where blockLabel = blockLbl blockid
 
 -- -----------------------------------------------------------------------------
 -- General things for putting together code sequences


=====================================
compiler/GHC/CmmToAsm/AArch64/Instr.hs
=====================================
@@ -27,7 +27,7 @@ import GHC.Types.Unique.Supply
 
 import GHC.Utils.Panic
 
-import Data.Maybe (fromMaybe, catMaybes)
+import Data.Maybe (fromMaybe)
 
 import GHC.Stack
 
@@ -118,7 +118,6 @@ regUsageOfInstr platform instr = case instr of
   ORR dst src1 src2        -> usage (regOp src1 ++ regOp src2, regOp dst)
   -- 4. Branch Instructions ----------------------------------------------------
   J t                      -> usage (regTarget t, [])
-  J_TBL _ _ t              -> usage ([t], [])
   B t                      -> usage (regTarget t, [])
   BCOND _ t                -> usage (regTarget t, [])
   BL t ps                  -> usage (regTarget t ++ ps, callerSavedRegisters)
@@ -265,11 +264,10 @@ patchRegsOfInstr instr env = case instr of
     ORR o1 o2 o3   -> ORR  (patchOp o1) (patchOp o2) (patchOp o3)
 
     -- 4. Branch Instructions --------------------------------------------------
-    J t               -> J (patchTarget t)
-    J_TBL ids mbLbl t -> J_TBL ids mbLbl (env t)
-    B t               -> B (patchTarget t)
-    BL t rs           -> BL (patchTarget t) rs
-    BCOND c t         -> BCOND c (patchTarget t)
+    J t            -> J (patchTarget t)
+    B t            -> B (patchTarget t)
+    BL t rs        -> BL (patchTarget t) rs
+    BCOND c t      -> BCOND c (patchTarget t)
 
     -- 5. Atomic Instructions --------------------------------------------------
     -- 6. Conditional Instructions ---------------------------------------------
@@ -320,7 +318,6 @@ isJumpishInstr instr = case instr of
     CBZ{} -> True
     CBNZ{} -> True
     J{} -> True
-    J_TBL{} -> True
     B{} -> True
     BL{} -> True
     BCOND{} -> True
@@ -334,7 +331,6 @@ jumpDestsOfInstr (ANN _ i) = jumpDestsOfInstr i
 jumpDestsOfInstr (CBZ _ t) = [ id | TBlock id <- [t]]
 jumpDestsOfInstr (CBNZ _ t) = [ id | TBlock id <- [t]]
 jumpDestsOfInstr (J t) = [id | TBlock id <- [t]]
-jumpDestsOfInstr (J_TBL ids _mbLbl _r) = catMaybes ids
 jumpDestsOfInstr (B t) = [id | TBlock id <- [t]]
 jumpDestsOfInstr (BL t _) = [ id | TBlock id <- [t]]
 jumpDestsOfInstr (BCOND _ t) = [ id | TBlock id <- [t]]
@@ -343,11 +339,6 @@ jumpDestsOfInstr _ = []
 canFallthroughTo :: Instr -> BlockId -> Bool
 canFallthroughTo (ANN _ i) bid = canFallthroughTo i bid
 canFallthroughTo (J (TBlock target)) bid = bid == target
-canFallthroughTo (J_TBL targets _ _) bid = all isTargetBid targets
-  where
-    isTargetBid target = case target of
-      Nothing -> True
-      Just target -> target == bid
 canFallthroughTo (B (TBlock target)) bid = bid == target
 canFallthroughTo _ _ = False
 
@@ -361,7 +352,6 @@ patchJumpInstr instr patchF
         CBZ r (TBlock bid) -> CBZ r (TBlock (patchF bid))
         CBNZ r (TBlock bid) -> CBNZ r (TBlock (patchF bid))
         J (TBlock bid) -> J (TBlock (patchF bid))
-        J_TBL ids mbLbl r -> J_TBL (map (fmap patchF) ids) mbLbl r
         B (TBlock bid) -> B (TBlock (patchF bid))
         BL (TBlock bid) ps -> BL (TBlock (patchF bid)) ps
         BCOND c (TBlock bid) -> BCOND c (TBlock (patchF bid))
@@ -525,7 +515,6 @@ allocMoreStack platform slots proc@(CmmProc info lbl live (ListGraph code)) = do
 
       insert_dealloc insn r = case insn of
         J _ -> dealloc ++ (insn : r)
-        J_TBL {} -> dealloc ++ (insn : r)
         ANN _ (J _) -> dealloc ++ (insn : r)
         _other | jumpDestsOfInstr insn /= []
             -> patchJumpInstr insn retarget : r
@@ -654,7 +643,6 @@ data Instr
     | CBNZ Operand Target -- if op /= 0, then branch.
     -- Branching.
     | J Target            -- like B, but only generated from genJump. Used to distinguish genJumps from others.
-    | J_TBL [Maybe BlockId] (Maybe CLabel) Reg -- A jump instruction with data for switch/jump tables
     | B Target            -- unconditional branching b/br. (To a blockid, label or register)
     | BL Target [Reg] -- branch and link (e.g. set x30 to next pc, and branch)
     | BCOND Cond Target   -- branch with condition. b.<cond>
@@ -736,7 +724,6 @@ instrCon i =
       CBZ{} -> "CBZ"
       CBNZ{} -> "CBNZ"
       J{} -> "J"
-      J_TBL {} -> "J_TBL"
       B{} -> "B"
       BL{} -> "BL"
       BCOND{} -> "BCOND"


=====================================
compiler/GHC/CmmToAsm/AArch64/Ppr.hs
=====================================
@@ -426,7 +426,6 @@ pprInstr platform instr = case instr of
 
   -- 4. Branch Instructions ----------------------------------------------------
   J t            -> pprInstr platform (B t)
-  J_TBL _ _ r    -> pprInstr platform (B (TReg r))
   B (TBlock bid) -> line $ text "\tb" <+> pprAsmLabel platform (mkLocalBlockLabel (getUnique bid))
   B (TLabel lbl) -> line $ text "\tb" <+> pprAsmLabel platform lbl
   B (TReg r)     -> line $ text "\tbr" <+> pprReg W64 r


=====================================
compiler/GHC/Tc/Instance/Family.hs
=====================================
@@ -58,6 +58,7 @@ import Data.Function ( on )
 
 import qualified GHC.LanguageExtensions  as LangExt
 import GHC.Unit.Env (unitEnv_hpts)
+import Data.List (sortOn)
 
 {- Note [The type family instance consistency story]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -239,6 +240,49 @@ That situation should be pretty common in practice, there's usually
 a set of utility modules that every module imports directly or indirectly.
 
 This is basically the idea from #13092, comment:14.
+
+Note [Order of type family consistency checks]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Consider a module M which imports modules A, B and C, all defining (open) type
+family instances.
+
+We can waste a lot of work in type family consistency checking depending on the
+order in which the modules are processed.
+
+Suppose for example that C imports A and B. When we compiled C, we will have
+checked A and B for consistency against eachother. This means that, when
+processing the imports of M to check type family instance consistency:
+
+* if C is processed first, then A and B will not need to be checked for
+  consistency against eachother again,
+* if we process A and B before C,then the
+  consistency checks between A and B will be performed again. This is wasted
+  work, as we already performed them for C.
+
+This can make a significant difference. Keeping the nomenclature of the above
+example for illustration, we have observed situations in practice in which the
+compilation time of M goes from 1 second (the "processing A and B first" case)
+down to 80 milliseconds (the "processing C first" case).
+
+Clearly we should engineer that C is checked before B and A, but by what scheme?
+
+A simple one is to observe that if a module M is in the transitive closure of X
+then the size of the consistent family set of M is less than or equal to size
+of the consistent family set of X.
+
+Therefore, by sorting the imports by the size of the consistent family set and
+processing the largest first, we make sure to process modules in topological
+order.
+
+For a particular project, without this change we did 40 million checks and with
+this change we did 22.9 million checks. This is significant as before this change
+type family consistency checks accounted for 26% of total type checker allocations which
+was reduced to 15%.
+
+See tickets #25554 for discussion about this exact issue and #25555 for
+why we still do redundant checks.
+
 -}
 
 -- We don't need to check the current module, this is done in
@@ -267,6 +311,12 @@ checkFamInstConsistency directlyImpMods
                  where
                  deps = dep_finsts . mi_deps . modIface $ mod
 
+             ; debug_consistent_set = map (\x -> (x, length (modConsistent x))) directlyImpMods
+
+             -- Sorting the list by size has the effect of performing a topological sort.
+             -- See Note [Order of type family consistency checks]
+             ; init_consistent_set = reverse (sortOn (length . modConsistent) directlyImpMods)
+
              ; hmiModule     = mi_module . hm_iface
              ; hmiFamInstEnv = extendFamInstEnvList emptyFamInstEnv
                                . md_fam_insts . hm_details
@@ -276,7 +326,8 @@ checkFamInstConsistency directlyImpMods
 
              }
 
-       ; checkMany hpt_fam_insts modConsistent directlyImpMods
+       ; traceTc "init_consistent_set" (ppr debug_consistent_set)
+       ; checkMany hpt_fam_insts modConsistent init_consistent_set
        }
   where
     -- See Note [Checking family instance optimization]
@@ -294,6 +345,11 @@ checkFamInstConsistency directlyImpMods
          -> TcM ()
       go _ _ [] = return ()
       go consistent consistent_set (mod:mods) = do
+        traceTc "checkManySize" (vcat [text "mod:" <+> ppr mod
+                                      , text "m1:" <+> ppr (length to_check_from_mod)
+                                      , text "m2:" <+> ppr (length (to_check_from_consistent))
+                                      , text "product:" <+> ppr (length to_check_from_mod * length to_check_from_consistent)
+                                      ])
         sequence_
           [ check hpt_fam_insts m1 m2
           | m1 <- to_check_from_mod


=====================================
compiler/GHC/Tc/Module.hs
=====================================
@@ -434,7 +434,9 @@ tcRnImports hsc_env import_decls
         ; let { dir_imp_mods = moduleEnvKeys
                              . imp_mods
                              $ imports }
-        ; checkFamInstConsistency dir_imp_mods
+        ; logger <- getLogger
+        ; withTiming logger (text "ConsistencyCheck"<+>brackets (ppr this_mod)) (const ())
+            $ checkFamInstConsistency dir_imp_mods
         ; traceRn "rn1: } checking family instance consistency" empty
 
         ; getGblEnv } }


=====================================
compiler/GHC/Tc/Solver/Equality.hs
=====================================
@@ -2228,7 +2228,7 @@ to our new cbv. This is actually done by `break_given` in
 `GHC.Tc.Solver.Monad.checkTypeEq`.
 
 Note its orientation: The type family ends up on the left; see
-Note [Orienting TyFamLHS/TyFamLHS]d. No special treatment for
+Note [Orienting TyFamLHS/TyFamLHS]. No special treatment for
 CycleBreakerTvs is necessary. This scenario is now easily soluble, by using
 the first Given to rewrite the Wanted, which can now be solved.
 
@@ -2900,8 +2900,7 @@ arising from injectivity improvement (#12522).  Suppose we have
   type instance F (a, Int) = (Int, G a)
 where G is injective; and wanted constraints
 
-  [W] TF (alpha, beta) ~ fuv
-  [W] fuv ~ (Int, <some type>)
+  [W] F (alpha, beta) ~ (Int, <some type>)
 
 The injectivity will give rise to constraints
 
@@ -2917,8 +2916,8 @@ so that the fresh unification variable will be eliminated in
 favour of alpha.  If we instead had
    [W] alpha ~ gamma1
 then we would unify alpha := gamma1; and kick out the wanted
-constraint.  But when we grough it back in, it'd look like
-   [W] TF (gamma1, beta) ~ fuv
+constraint.  But when we substitute it back in, it'd look like
+   [W] F (gamma1, beta) ~ fuv
 and exactly the same thing would happen again!  Infinite loop.
 
 This all seems fragile, and it might seem more robust to avoid
@@ -2999,8 +2998,9 @@ improveTopFunEqs fam_tc args (EqCt { eq_ev = ev, eq_rhs = rhs })
   | otherwise
   = do { fam_envs <- getFamInstEnvs
        ; eqns <- improve_top_fun_eqs fam_envs fam_tc args rhs
-       ; traceTcS "improveTopFunEqs" (vcat [ ppr fam_tc <+> ppr args <+> ppr rhs
-                                           , ppr eqns ])
+       ; traceTcS "improveTopFunEqs" (vcat [ text "lhs:" <+> ppr fam_tc <+> ppr args
+                                           , text "rhs:" <+> ppr rhs
+                                           , text "eqns:" <+> ppr eqns ])
        ; unifyFunDeps ev Nominal $ \uenv ->
          uPairsTcM (bump_depth uenv) (reverse eqns) }
          -- Missing that `reverse` causes T13135 and T13135_simple to loop.
@@ -3072,17 +3072,17 @@ improve_top_fun_eqs fam_envs fam_tc args rhs_ty
                      -> ([Type], Subst, [TyCoVar], Maybe CoAxBranch)
                      -> TcS [TypeEqn]
       injImproveEqns inj_args (ax_args, subst, unsubstTvs, cabr)
-        = do { subst <- instFlexiX subst unsubstTvs
+        = do { subst1 <- instFlexiX subst unsubstTvs
                   -- If the current substitution bind [k -> *], and
                   -- one of the un-substituted tyvars is (a::k), we'd better
                   -- be sure to apply the current substitution to a's kind.
                   -- Hence instFlexiX.   #13135 was an example.
 
-             ; return [ Pair (substTy subst ax_arg) arg
+             ; return [ Pair (substTy subst1 ax_arg) arg
                         -- NB: the ax_arg part is on the left
                         -- see Note [Improvement orientation]
                       | case cabr of
-                          Just cabr' -> apartnessCheck (substTys subst ax_args) cabr'
+                          Just cabr' -> apartnessCheck (substTys subst1 ax_args) cabr'
                           _          -> True
                       , (ax_arg, arg, True) <- zip3 ax_args args inj_args ] }
 


=====================================
compiler/GHC/Tc/Utils/Unify.hs
=====================================
@@ -2360,7 +2360,14 @@ uUnfilledVar2 env@(UE { u_defer = def_eq_ref }) swapped tv1 ty2
     do { def_eqs <- readTcRef def_eq_ref  -- Capture current state of def_eqs
 
        -- Attempt to unify kinds
-       ; co_k <- uType (mkKindEnv env ty1 ty2) (typeKind ty2) (tyVarKind tv1)
+       -- When doing so, be careful to preserve orientation;
+       --    see Note [Kind Equality Orientation] in GHC.Tc.Solver.Equality
+       --    and wrinkle (W2) in Note [Fundeps with instances, and equality orientation]
+       --        in GHC.Tc.Solver.Dict
+       -- Failing to preserve orientation led to #25597.
+       ; let kind_env = unSwap swapped (mkKindEnv env) ty1 ty2
+       ; co_k <- unSwap swapped (uType kind_env) (tyVarKind tv1) (typeKind ty2)
+
        ; traceTc "uUnfilledVar2 ok" $
          vcat [ ppr tv1 <+> dcolon <+> ppr (tyVarKind tv1)
               , ppr ty2 <+> dcolon <+> ppr (typeKind  ty2)


=====================================
docs/users_guide/using-concurrent.rst
=====================================
@@ -153,6 +153,14 @@ use the RTS :rts-flag:`-N ⟨x⟩` options.
     changed while the program is running by calling
     ``Control.Concurrent.setNumCapabilities``.
 
+
+.. note::
+
+    The maximum number of capabilities supported by the GHC runtime system is
+    determined when at RTS startup to be either 256, the value given by
+    :rts-flag:`-N ⟨x⟩`, or the number of logical CPU cores, whichever is
+    greater.
+
 The following options affect the way the runtime schedules threads on
 CPUs:
 


=====================================
hadrian/src/Rules/Documentation.hs
=====================================
@@ -334,11 +334,27 @@ getPkgDocTarget root path =
 
 -- | Build all PDF documentation
 buildPdfDocumentation :: Rules ()
-buildPdfDocumentation = mapM_ buildSphinxPdf docPaths
+buildPdfDocumentation = do
+    -- Note [Avoiding mktexfmt race]
+    -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    -- We must ensure that the *first* xelatex invocation in the
+    -- build is performed serially (that is, not concurrently with
+    -- any other xelatex invocations) as mktexfmt does not handle
+    -- racing `mkdir` calls gracefully. However, we assume that
+    -- subsequent invocations are safe to run concurrently since the
+    -- initial call will have created the requisite directories (namely
+    -- $HOME/.texlive2020/texmf-var/web2c).
+    --
+    -- Fixes #25564.
+    let maxConcurrentReaders = 1000
+    xelatexMutex <- newResource "xelatex-mutex" maxConcurrentReaders
+    let rs = [(xelatexMutex, 1)]
+
+    mapM_ (buildSphinxPdf rs) docPaths
 
 -- | Compile a Sphinx ReStructured Text package to LaTeX
-buildSphinxPdf :: FilePath -> Rules ()
-buildSphinxPdf path = do
+buildSphinxPdf :: [(Resource, Int)] -> FilePath -> Rules ()
+buildSphinxPdf rs path = do
     root <- buildRootRules
     root -/- pdfRoot -/- path <.> "pdf" %> \file -> do
 
@@ -352,7 +368,8 @@ buildSphinxPdf path = do
             checkSphinxWarnings dir
 
             -- LaTeX "fixed point"
-            build $ target docContext Xelatex [path <.> "tex"] [dir]
+            -- See Note [Avoiding mktexfmt race] above.
+            buildWithResources rs $ target docContext Xelatex [path <.> "tex"] [dir]
             build $ target docContext Xelatex [path <.> "tex"] [dir]
             build $ target docContext Xelatex [path <.> "tex"] [dir]
             build $ target docContext Makeindex [path <.> "idx"] [dir]


=====================================
libraries/ghc-internal/src/GHC/Internal/Conc/Sync.hs
=====================================
@@ -1,4 +1,3 @@
-{-# LANGUAGE CPP #-}
 {-# LANGUAGE MagicHash #-}
 {-# LANGUAGE NoImplicitPrelude #-}
 {-# LANGUAGE RankNTypes #-}
@@ -394,13 +393,14 @@ getNumProcessors :: IO Int
 getNumProcessors = fmap fromIntegral c_getNumberOfProcessors
 
 foreign import ccall unsafe "getNumberOfProcessors"
-  c_getNumberOfProcessors :: IO CUInt
+  c_getNumberOfProcessors :: IO Word32
 
 -- | Returns the number of sparks currently in the local spark pool
 numSparks :: IO Int
 numSparks = IO $ \s -> case numSparks# s of (# s', n #) -> (# s', I# n #)
 
-foreign import ccall "&enabled_capabilities" enabled_capabilities :: Ptr CInt
+foreign import ccall "&enabled_capabilities"
+  enabled_capabilities :: Ptr Word32
 
 childHandler :: SomeException -> IO ()
 childHandler err = catch (real_handler err) childHandler


=====================================
rts/Capability.c
=====================================
@@ -16,6 +16,7 @@
  *
  * --------------------------------------------------------------------------*/
 
+#include "rts/Config.h"
 #include "rts/PosixSource.h"
 #include "Rts.h"
 
@@ -40,12 +41,16 @@ Capability MainCapability;
 uint32_t n_capabilities = 0;
 uint32_t enabled_capabilities = 0;
 
+// The size of the `capabilities` array initialized at RTS startup. Referenced
+// by GHC.Internal.Conc.Sync
+uint32_t max_n_capabilities = MAX_N_CAPABILITIES;
+
 // The array of Capabilities.  It's important that when we need
 // to allocate more Capabilities we don't have to move the existing
 // Capabilities, because there may be pointers to them in use
 // (e.g. threads in waitForCapability(), see #8209), so this is
 // an array of Capability* rather than an array of Capability.
-Capability *capabilities[MAX_N_CAPABILITIES];
+Capability **capabilities;
 
 // Holds the Capability which last became free.  This is used so that
 // an in-call has a chance of quickly finding a free Capability.
@@ -344,8 +349,6 @@ initCapability (Capability *cap, uint32_t i)
  * ------------------------------------------------------------------------- */
 void initCapabilities (void)
 {
-    uint32_t i;
-
     /* Declare a couple capability sets representing the process and
        clock domain. Each capability will get added to these capsets. */
     traceCapsetCreate(CAPSET_OSPROCESS_DEFAULT, CapsetTypeOsProcess);
@@ -354,7 +357,7 @@ void initCapabilities (void)
     // Initialise NUMA
     if (!RtsFlags.GcFlags.numa) {
         n_numa_nodes = 1;
-        for (i = 0; i < MAX_NUMA_NODES; i++) {
+        for (uint32_t i = 0; i < MAX_NUMA_NODES; i++) {
             numa_map[i] = 0;
         }
     } else if (RtsFlags.DebugFlags.numa) {
@@ -388,12 +391,30 @@ void initCapabilities (void)
     }
 #endif
 
-    if (RtsFlags.ParFlags.nCapabilities > MAX_N_CAPABILITIES) {
-        errorBelch("warning: this GHC runtime system only supports up to %d capabilities",
-                   MAX_N_CAPABILITIES);
-        RtsFlags.ParFlags.nCapabilities = MAX_N_CAPABILITIES;
+    /*
+     * Note [Capabilities array sizing]
+     * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+     * Determine the size of the capabilities array as the maximum of:
+     *   * the static lower bound, `MAX_N_CAPABILITIES`
+     *   * the logical core count
+     *   * the users's choice of `+RTS -N`
+     * This will serve as the upper bound on the capability count for the rest
+     * of execution. Calls to `setNumCapabilities` exceeding this bound will
+     * issue a warning and otherwise have no effect.
+     *
+     * See #25560.
+     */
+    uint32_t core_count = getNumberOfProcessors();
+    if (core_count > max_n_capabilities) {
+        max_n_capabilities = core_count;
     }
 
+    if (RtsFlags.ParFlags.nCapabilities > max_n_capabilities) {
+        max_n_capabilities = RtsFlags.ParFlags.nCapabilities;
+    }
+
+    capabilities = stgMallocBytes(sizeof(Capability) * max_n_capabilities, "initCapabilities");
+
     n_capabilities = 0;
     moreCapabilities(0, RtsFlags.ParFlags.nCapabilities);
     n_capabilities = RtsFlags.ParFlags.nCapabilities;
@@ -401,6 +422,7 @@ void initCapabilities (void)
 #else /* !THREADED_RTS */
 
     n_capabilities = 1;
+    capabilities = stgMallocBytes(sizeof(Capability), "initCapabilities");
     capabilities[0] = &MainCapability;
 
     initCapability(&MainCapability, 0);
@@ -412,7 +434,7 @@ void initCapabilities (void)
     // There are no free capabilities to begin with.  We will start
     // a worker Task to each Capability, which will quickly put the
     // Capability on the free list when it finds nothing to do.
-    for (i = 0; i < n_numa_nodes; i++) {
+    for (uint32_t i = 0; i < n_numa_nodes; i++) {
         last_free_capability[i] = getCapability(0);
     }
 }


=====================================
rts/Capability.h
=====================================
@@ -266,11 +266,13 @@ INLINE_HEADER void releaseCapability_ (Capability* cap STG_UNUSED,
 // extern Capability MainCapability;
 
 // declared in rts/include/rts/Threads.h:
+// extern uint32_t max_n_capabilities;
 // extern uint32_t n_capabilities;
 // extern uint32_t enabled_capabilities;
 
-// Array of all the capabilities
-extern Capability *capabilities[MAX_N_CAPABILITIES];
+// Array of all the capabilities, of size max_n_capabilities
+// See Note [Capabilities array sizing] in rts/Capability.c.
+extern Capability **capabilities;
 
 INLINE_HEADER Capability *getCapability(uint32_t i)
 {


=====================================
rts/CheckUnload.c
=====================================
@@ -166,7 +166,7 @@ ObjectCode *loaded_objects;
 static OCSectionIndices *global_s_indices = NULL;
 
 // Is it safe for us to unload code?
-static bool safeToUnload(void)
+static bool tryToUnload(void)
 {
     if (RtsFlags.ProfFlags.doHeapProfile != NO_HEAP_PROFILING) {
         // We mustn't unload anything as the heap census may contain
@@ -174,7 +174,8 @@ static bool safeToUnload(void)
         // See #24512.
         return false;
     }
-    return true;
+
+    return global_s_indices != NULL;
 }
 
 static OCSectionIndices *createOCSectionIndices(void)
@@ -432,7 +433,7 @@ static bool markObjectLive(void *data STG_UNUSED, StgWord key, const void *value
 
 void markObjectCode(const void *addr)
 {
-    if (global_s_indices == NULL) {
+    if (!tryToUnload()) {
         return;
     }
 
@@ -450,7 +451,7 @@ void markObjectCode(const void *addr)
 // unloading.
 bool prepareUnloadCheck(void)
 {
-    if (global_s_indices == NULL) {
+    if (!tryToUnload()) {
         return false;
     }
 
@@ -467,48 +468,44 @@ bool prepareUnloadCheck(void)
 
 void checkUnload(void)
 {
-    if (global_s_indices == NULL) {
-        return;
-    } else if (!safeToUnload()) {
-        return;
-    }
-
     // At this point we've marked all dynamically loaded static objects
     // (including their dependencies) during GC, but not the root set of object
     // code (loaded_objects). Mark the roots first, then unload any unmarked
     // objects.
 
-    OCSectionIndices *s_indices = global_s_indices;
-    ASSERT(s_indices->sorted);
+    if (tryToUnload()) {
+        OCSectionIndices *s_indices = global_s_indices;
+        ASSERT(s_indices->sorted);
 
-    // Mark roots
-    for (ObjectCode *oc = loaded_objects; oc != NULL; oc = oc->next_loaded_object) {
-        markObjectLive(NULL, (W_)oc, NULL);
-    }
+        // Mark roots
+        for (ObjectCode *oc = loaded_objects; oc != NULL; oc = oc->next_loaded_object) {
+            markObjectLive(NULL, (W_)oc, NULL);
+        }
 
-    // Free unmarked objects
-    ObjectCode *next = NULL;
-    for (ObjectCode *oc = old_objects; oc != NULL; oc = next) {
-        next = oc->next;
-        ASSERT(oc->status == OBJECT_UNLOADED);
-
-        // Symbols should be removed by unloadObj_.
-        // NB (osa): If this assertion doesn't hold then freeObjectCode below
-        // will corrupt symhash as keys of that table live in ObjectCodes. If
-        // you see a segfault in a hash table operation in linker (in non-debug
-        // RTS) then it's probably because this assertion did not hold.
-        ASSERT(oc->symbols == NULL);
-
-        if (oc->unloadable) {
-            removeOCSectionIndices(s_indices, oc);
-            freeObjectCode(oc);
-            n_unloaded_objects -= 1;
-        } else {
-            // If we don't have enough information to
-            // accurately determine the reachability of
-            // the object then hold onto it.
-            oc->next = objects;
-            objects = oc;
+        // Free unmarked objects
+        ObjectCode *next = NULL;
+        for (ObjectCode *oc = old_objects; oc != NULL; oc = next) {
+            next = oc->next;
+            ASSERT(oc->status == OBJECT_UNLOADED);
+
+            // Symbols should be removed by unloadObj_.
+            // NB (osa): If this assertion doesn't hold then freeObjectCode below
+            // will corrupt symhash as keys of that table live in ObjectCodes. If
+            // you see a segfault in a hash table operation in linker (in non-debug
+            // RTS) then it's probably because this assertion did not hold.
+            ASSERT(oc->symbols == NULL);
+
+            if (oc->unloadable) {
+                removeOCSectionIndices(s_indices, oc);
+                freeObjectCode(oc);
+                n_unloaded_objects -= 1;
+            } else {
+                // If we don't have enough information to
+                // accurately determine the reachability of
+                // the object then hold onto it.
+                oc->next = objects;
+                objects = oc;
+            }
         }
     }
 


=====================================
rts/Linker.c
=====================================
@@ -1076,7 +1076,7 @@ freePreloadObjectFile (ObjectCode *oc)
  */
 void freeObjectCode (ObjectCode *oc)
 {
-    IF_DEBUG(linker, ocDebugBelch(oc, "start\n"));
+    IF_DEBUG(linker, ocDebugBelch(oc, "freeObjectCode: start\n"));
 
     // Run finalizers
     if (oc->type == STATIC_OBJECT &&


=====================================
rts/ProfHeap.c
=====================================
@@ -958,9 +958,9 @@ dumpCensus( Census *census )
                                           count * sizeof(W_));
             break;
         case HEAP_BY_ERA:
-            fprintf(hp_file, "%lu", (StgWord)ctr->identity);
+            fprintf(hp_file, "%" FMT_Word, (StgWord)ctr->identity);
             char str_era[100];
-            sprintf(str_era, "%lu", (StgWord)ctr->identity);
+            sprintf(str_era, "%" FMT_Word, (StgWord)ctr->identity);
             traceHeapProfSampleString(0, str_era, count * sizeof(W_));
             break;
         case HEAP_BY_MOD:


=====================================
rts/RtsSymbols.c
=====================================
@@ -911,6 +911,7 @@ extern char **environ;
       SymI_NeedsDataProto(rts_stop_on_exception)                        \
       SymI_HasProto(stopTimer)                                          \
       SymI_HasProto(n_capabilities)                                     \
+      SymI_HasProto(max_n_capabilities)                                 \
       SymI_HasProto(enabled_capabilities)                               \
       SymI_HasDataProto(stg_traceEventzh)                                   \
       SymI_HasDataProto(stg_traceMarkerzh)                                  \


=====================================
rts/Schedule.c
=====================================
@@ -2276,9 +2276,12 @@ setNumCapabilities (uint32_t new_n_capabilities USED_IF_THREADS)
     } else if (new_n_capabilities <= 0) {
         errorBelch("setNumCapabilities: Capability count must be positive");
         return;
+    } else if (new_n_capabilities > max_n_capabilities) {
+        // See Note [Capabilities array sizing] in rts/Capability.c.
+        errorBelch("setNumCapabilities: Attempt to increase capability count beyond maximum capability count %" PRIu32 "; clamping...\n", max_n_capabilities);
+        new_n_capabilities = max_n_capabilities;
     }
 
-
     debugTrace(DEBUG_sched, "changing the number of Capabilities from %d to %d",
                enabled_capabilities, new_n_capabilities);
 


=====================================
rts/include/rts/Config.h
=====================================
@@ -78,6 +78,10 @@ code.
 #endif
 
 #if defined(THREADED_RTS)
+/*
+ * See Note [Capabilities array sizing] in rts/Capability.c.
+ * Update the note in docs/users_guide/using-concurrent.rst when updating this.
+ */
 #define MAX_N_CAPABILITIES 256
 #else
 #define MAX_N_CAPABILITIES 1


=====================================
rts/include/rts/Threads.h
=====================================
@@ -69,7 +69,7 @@ HsBool rtsSupportsBoundThreads (void);
 // The number of Capabilities.
 // TODO: Ideally we would only provide getNumCapabilities
 // but this is used in compiler/cbits/genSym.c
-extern unsigned int n_capabilities;
+extern uint32_t n_capabilities;
 
 INLINE_HEADER unsigned int getNumCapabilities(void)
 { return RELAXED_LOAD(&n_capabilities); }
@@ -77,6 +77,10 @@ INLINE_HEADER unsigned int getNumCapabilities(void)
 // The number of Capabilities that are not disabled
 extern uint32_t enabled_capabilities;
 
+// The maximum number of Capabilities supported by the RTS.
+// See Note [Capabilities array sizing] in rts/Capability.c.
+extern uint32_t max_n_capabilities;
+
 #if !IN_STG_CODE
 extern Capability MainCapability;
 #endif


=====================================
rts/linker/MMap.c
=====================================
@@ -344,12 +344,7 @@ mmapInRegion (
         if (result == NULL) {
             // The mapping failed
             return NULL;
-        } else if (result < region->start) {
-            // Uh oh, we assume that mmap() will only give us a
-            // an address at or after the requested address.
-            // Try again.
-            p = (uint8_t *) result + bytes;
-        } else if (result < region->end) {
+        } else if (result >= region->start && result < region->end) {
             // Success!
             region->last = (uint8_t *) result + bytes;
             return result;
@@ -357,17 +352,23 @@ mmapInRegion (
             // We failed to find a suitable mapping
             munmap(result, bytes);
             reportMemoryMap();
-            errorBelch("mmapForLinker: failed to mmap() memory below 2Gb; "
+            errorBelch("mmapForLinker: failed to mmap() memory between %p and %p; "
                        "asked for %zu bytes at %p. "
                        "Try specifying an address with +RTS -xm<addr> -RTS",
-                       bytes, p);
+                       region->start, region->end, bytes, p);
             return NULL;
-        }
+        } else if (result < region->start) {
+            // Uh oh, we assume that mmap() will only give us a
+            // an address at or after the requested address.
+            // Try bump forward by a bit and try again.
+            p = (uint8_t *) p + bytes;
+        } else if (result >= region->end) {
+            // mmap() gave us too high an address; wrap around and try again
+            wrapped = true;
+            p = region->start;
+       }
 
-        // mmap() gave us too high an address; wrap around and try again
         munmap(result, bytes);
-        wrapped = true;
-        p = region->start;
     }
 }
 


=====================================
testsuite/driver/testlib.py
=====================================
@@ -275,6 +275,10 @@ def req_bco( name, opts ):
     # JS backend doesn't support ByteCode
     js_skip(name, opts)
 
+def req_c_rts( name, opts ):
+    """ Require the C runtime system (rather than, e.g. the Javascript RTS). """
+    js_skip(name, opts)
+
 def req_rts_linker( name, opts ):
     if not config.have_RTS_linker:
         opts.expect = 'fail'


=====================================
testsuite/tests/rep-poly/RepPolyMcGuard.stderr
=====================================
@@ -1,12 +1,12 @@
-
 RepPolyMcGuard.hs:30:16: error: [GHC-55287]
-    • The first argument of the rebindable syntax operator ‘guard’
+    • The first argument of the rebindable syntax operator ‘(>>)’
         arising from a statement in a monad comprehension
       does not have a fixed runtime representation.
       Its type is:
-        a0 :: TYPE rep0
+        ma0 :: TYPE rep0
       Cannot unify ‘rep’ with the type variable ‘rep0’
       because the former is not a concrete ‘RuntimeRep’.
     • In a stmt of a monad comprehension: undefined
       In the expression: [() | undefined]
       In an equation for ‘foo’: foo _ = [() | undefined]
+


=====================================
testsuite/tests/rep-poly/RepPolyRecordUpdate.stderr
=====================================
@@ -1,4 +1,3 @@
-
 RepPolyRecordUpdate.hs:7:35: error: [GHC-55287]
     • The newtype constructor pattern
       does not have a fixed runtime representation.
@@ -14,7 +13,7 @@ RepPolyRecordUpdate.hs:7:35: error: [GHC-55287]
       X a :: TYPE rep
 
 RepPolyRecordUpdate.hs:13:9: error: [GHC-55287]
-    • The record update at field ‘fld’
+    • The argument ‘fld’ of ‘MkX’
       does not have a fixed runtime representation.
       Its type is:
         a :: TYPE rep0
@@ -25,3 +24,4 @@ RepPolyRecordUpdate.hs:13:9: error: [GHC-55287]
       and data constructor ‘MkX’.
       In the expression: x {fld = meth ()}
       In an equation for ‘upd’: upd x = x {fld = meth ()}
+


=====================================
testsuite/tests/rep-poly/T12709.stderr
=====================================
@@ -1,6 +1,5 @@
-
 T12709.hs:28:13: error: [GHC-55287]
-    • The argument ‘1’ of ‘(+)’
+    • The argument ‘1 + 2 + 3’ of ‘(+)’
       does not have a fixed runtime representation.
       Its type is:
         a0 :: TYPE rep0
@@ -13,3 +12,4 @@ T12709.hs:28:13: error: [GHC-55287]
           u :: Num (a :: TYPE rep) => a
           u = 1 + 2 + 3 + 4
         in BUB u u
+


=====================================
testsuite/tests/rts/T25560.hs
=====================================
@@ -0,0 +1,4 @@
+import GHC.Conc
+
+main :: IO ()
+main = setNumCapabilities 100000


=====================================
testsuite/tests/rts/all.T
=====================================
@@ -616,3 +616,7 @@ test('T23400', [], compile_and_run, ['-with-rtsopts -A8k'])
 test('IOManager', [js_skip, when(arch('wasm32'), skip), when(opsys('mingw32'), skip),
                    pre_cmd('$MAKE -s --no-print-directory IOManager.hs')],
                   compile_and_run, [''])
+
+# N.B. This will likely issue a warning on stderr but we merely care that the
+# program doesn't crash.
+test('T25560', [req_c_rts, ignore_stderr], compile_and_run, [''])
\ No newline at end of file


=====================================
testsuite/tests/typecheck/should_compile/T25597.hs
=====================================
@@ -0,0 +1,74 @@
+{-# LANGUAGE DataKinds              #-}
+{-# LANGUAGE FlexibleContexts       #-}
+{-# LANGUAGE FlexibleInstances      #-}
+{-# LANGUAGE FunctionalDependencies #-}
+{-# LANGUAGE GADTs                  #-}
+{-# LANGUAGE PolyKinds              #-}
+{-# LANGUAGE RankNTypes             #-}
+{-# LANGUAGE TypeFamilyDependencies #-}
+{-# LANGUAGE TypeOperators          #-}
+{-# LANGUAGE UndecidableInstances   #-}
+
+module T25597 where
+import           Data.Kind (Type)
+
+data Env (f :: k -> Type) (as :: [k]) where
+  ENil  :: Env f '[]
+  ECons :: f a -> Env f as -> Env f (a ': as)
+
+data Sig2 k = [k] :~> k
+
+data DimSimple (s :: Sig2 k) where
+  DimSimple :: OfLength as -> DimSimple (as ':~> a)
+
+data OfLength as where
+  LZ :: OfLength '[]
+  LS :: OfLength as -> OfLength (a ': as)
+
+class LiftOfLength f as t | t -> as where
+  liftOfLength :: OfLength as -> f t
+
+instance t ~ (as ':~> a) => LiftOfLength DimSimple as t where
+  liftOfLength = undefined
+
+data EnvI (sem :: [k] -> k -> Type) (a :: k)
+
+type family Func sem as r where
+  Func sem '[] r       = r
+  Func sem (a ': as) r = sem a -> Func sem as r
+
+
+type family FuncU (sem :: [k] -> k -> Type) (ss :: [Sig2 k])
+                  (r :: k) = res | res -> sem r where
+  FuncU sem '[] r = EnvI sem r
+  FuncU sem ((as ':~> a) ': ss) r = Func (EnvI sem) as (EnvI sem a)
+                                    -> FuncU sem ss r
+
+lifts :: Env DimSimple ss -> FuncU sem ss r
+lifts _ = undefined
+
+-- The following version specialized to singletons does not cause an issue
+type family FuncS (sem :: [k] -> k -> Type) (s :: Sig2 k)
+                  (r :: k) = res | res -> sem r where
+  FuncS sem (as ':~> a) r = Func (EnvI sem) as (EnvI sem a) -> EnvI sem r
+
+
+lift :: DimSimple s -> FuncS sem s r
+lift _ = undefined
+
+-- The following code causes non termination of type checking in GHC 9.2, 9.8, 9.10, and 9.12
+f :: (EnvI Sem a -> EnvI Sem b) -> EnvI Sem (a -> b)
+f = lifts (ECons (liftOfLength (LS LZ)) ENil)
+
+data Sem (env :: [Type]) a
+
+-- Following versions have no issues in GHC 9.8
+-- (I haven't tested other compilers but expect the similar results)
+-- f = undefined $ lifts (ECons (liftOfLength (LS LZ)) ENil)
+-- f = let h = lifts (ECons (liftOfLength (LS LZ)) ENil) in h
+-- f = h where h = lifts (ECons (liftOfLength (LS LZ)) ENil)
+-- f = lifts (ECons (DimSimple (LS LZ)) ENil)
+-- f = lifts d where {d :: Env DimSimple '[ '[a] :~> b ]; d = (ECons (liftOfLength (LS LZ)) ENil) }
+-- f = lift (liftOfLength (LS LZ))
+-- f = (lifts :: Env DimSimple ss -> FuncU Sem ss r) (ECons (liftOfLength (LS LZ)) ENil)
+-- f without its signature


=====================================
testsuite/tests/typecheck/should_compile/all.T
=====================================
@@ -925,3 +925,4 @@ test('T24470b', normal, compile, [''])
 test('T24566', [], makefile_test, [])
 test('T23739a', normal, compile, [''])
 test('T24810', normal, compile, [''])
+test('T25597', normal, compile, [''])



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/459075c552d7d1f3d06658c5ab0d4fba096aa819...29e25ab02e38de065239827d56de1ed4a63512b1

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/459075c552d7d1f3d06658c5ab0d4fba096aa819...29e25ab02e38de065239827d56de1ed4a63512b1
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20250214/9a7fa908/attachment-0001.html>