[Git][ghc/ghc][wip/T24471] 5 commits: base: Reflect new era profiling RTS flags in GHC.RTS.Flags

Cheng Shao (@TerrorJack) gitlab at gitlab.haskell.org
Wed Mar 6 06:14:23 UTC 2024



Cheng Shao pushed to branch wip/T24471 at Glasgow Haskell Compiler / GHC


Commits:
4074a3f2 by Matthew Pickering at 2024-03-05T21:44:35-05:00
base: Reflect new era profiling RTS flags in GHC.RTS.Flags

* -he profiling mode
* -he profiling selector
* --automatic-era-increment

CLC proposal #254 - https://github.com/haskell/core-libraries-committee/issues/254

- - - - -
a8c0e31b by Sylvain Henry at 2024-03-05T21:45:14-05:00
JS: faster implementation for some numeric primitives (#23597)

Use faster implementations for the following primitives in the JS
backend by not using JavaScript's BigInt:
- plusInt64
- minusInt64
- minusWord64
- timesWord64
- timesInt64

Co-authored-by: Josh Meredith <joshmeredith2008 at gmail.com>

- - - - -
21e3f325 by Cheng Shao at 2024-03-05T21:45:52-05:00
rts: add -xr option to control two step allocator reserved space size

This patch adds a -xr RTS option to control the size of virtual memory
address space reserved by the two step allocator on a 64-bit platform,
see added documentation for explanation. Closes #24498.

- - - - -
74cb79b1 by Ben Gamari at 2024-03-06T06:12:17+00:00
ghc-experimental: Add dummy dependencies to work around #23942

This is a temporary measure to improve CI reliability until a proper
solution is developed.

Works around #23942.

- - - - -
d8161acc by Simon Peyton Jones at 2024-03-06T06:13:48+00:00
Three compile perf improvements with deep nesting

These were changes are all triggered by #24471.

1. Make GHC.Core.Opt.SetLevels.lvlMFE behave better when there are
   many free variables.  See Note [Large free-variable sets].

2. Make GHC.Core.Opt.Arity.floatIn a bit lazier in its Cost argument.
   This benefits the common case where the ArityType turns out to
   be nullary. See Note [Care with nested expressions]

3. Make GHC.CoreToStg.Prep.cpeArg behave for deeply-nested
   expressions.  See Note [Eta expansion of arguments in CorePrep]
   wrinkle (EA2).

Compile times go down by up to 4.5%, and much more in artificial
cases. (Geo mean of compiler/perf changes is -0.4%.)

Metric Decrease:
    CoOpt_Read
    T10421
    T12425

- - - - -


21 changed files:

- compiler/GHC/Core/Opt/Arity.hs
- compiler/GHC/Core/Opt/SetLevels.hs
- compiler/GHC/CoreToStg/Prep.hs
- docs/users_guide/9.10.1-notes.rst
- docs/users_guide/runtime_control.rst
- libraries/base/changelog.md
- libraries/ghc-experimental/src/Data/Sum/Experimental.hs
- libraries/ghc-experimental/src/Data/Tuple/Experimental.hs
- libraries/ghc-internal/src/GHC/Internal/RTS/Flags.hsc
- rts/RtsFlags.c
- rts/include/rts/Flags.h
- rts/js/arith.js
- rts/sm/MBlock.c
- testsuite/tests/interface-stability/base-exports.stdout
- testsuite/tests/interface-stability/base-exports.stdout-javascript-unknown-ghcjs
- testsuite/tests/interface-stability/base-exports.stdout-mingw32
- testsuite/tests/interface-stability/base-exports.stdout-ws-32
- + testsuite/tests/perf/compiler/T24471.hs
- + testsuite/tests/perf/compiler/T24471a.hs
- testsuite/tests/perf/compiler/all.T
- testsuite/tests/rts/all.T


Changes:

=====================================
compiler/GHC/Core/Opt/Arity.hs
=====================================
@@ -1270,8 +1270,14 @@ arityLam id (AT oss div)
 floatIn :: Cost -> ArityType -> ArityType
 -- We have something like (let x = E in b),
 -- where b has the given arity type.
-floatIn IsCheap     at = at
-floatIn IsExpensive at = addWork at
+-- NB: be as lazy as possible in the Cost-of-E argument;
+--     we can often get away without ever looking at it
+--     See Note [Care with nested expressions]
+floatIn ch at@(AT lams div)
+  = case lams of
+      []                 -> at
+      (IsExpensive,_):_  -> at
+      (_,os):lams        -> AT ((ch,os):lams) div
 
 addWork :: ArityType -> ArityType
 -- Add work to the outermost level of the arity type
@@ -1354,6 +1360,25 @@ That gives \1.T (see Note [Combining case branches: andWithTail],
 first bullet).  So 'go2' gets an arityType of \(C?)(C1).T, which is
 what we want.
 
+Note [Care with nested expressions]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider
+    arityType (Just <big-expressions>)
+We will take
+    arityType Just = AT [(IsCheap,os)] topDiv
+and then do
+    arityApp (AT [(IsCheap os)] topDiv) (exprCost <big-expression>)
+The result will be AT [] topDiv.  It doesn't matter what <big-expresison>
+is!  The same is true of
+    arityType (let x = <rhs> in <body>)
+where the cost of <rhs> doesn't matter unless <body> has a useful
+arityType.
+
+TL;DR in `floatIn`, do not to look at the Cost argument until you have to.
+
+I found this when looking at #24471, although I don't think it was really
+the main culprit.
+
 Note [Combining case branches: andWithTail]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 When combining the ArityTypes for two case branches (with andArityType)
@@ -1576,7 +1601,7 @@ arityType env (Case scrut bndr _ alts)
   = alts_type
 
   | otherwise            -- In the remaining cases we may not push
-  = addWork alts_type -- evaluation of the scrutinee in
+  = addWork alts_type    -- evaluation of the scrutinee in
   where
     env' = delInScope env bndr
     arity_type_alt (Alt _con bndrs rhs) = arityType (delInScopeList env' bndrs) rhs


=====================================
compiler/GHC/Core/Opt/SetLevels.hs
=====================================
@@ -643,7 +643,8 @@ lvlMFE env strict_ctxt e@(_, AnnCase {})
   = lvlExpr env e     -- See Note [Case MFEs]
 
 lvlMFE env strict_ctxt ann_expr
-  |  floatTopLvlOnly env && not (isTopLvl dest_lvl)
+  | not float_me
+  || floatTopLvlOnly env && not (isTopLvl dest_lvl)
          -- Only floating to the top level is allowed.
   || hasFreeJoin env fvs   -- If there is a free join, don't float
                            -- See Note [Free join points]
@@ -652,8 +653,9 @@ lvlMFE env strict_ctxt ann_expr
          -- how it will be represented at runtime.
          -- See Note [Representation polymorphism invariants] in GHC.Core
   || notWorthFloating expr abs_vars
-  || not float_me
-  =     -- Don't float it out
+         -- Test notWorhtFloating last;
+         -- See Note [Large free-variable sets]
+  = -- Don't float it out
     lvlExpr env ann_expr
 
   |  float_is_new_lam || exprIsTopLevelBindable expr expr_ty
@@ -822,6 +824,28 @@ early loses opportunities for RULES which (needless to say) are
 important in some nofib programs (gcd is an example).  [SPJ note:
 I think this is obsolete; the flag seems always on.]
 
+Note [Large free-variable sets]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In #24471 we had something like
+     x1 = I# 1
+     ...
+     x1000 = I# 1000
+     foo = f x1 (f x2 (f x3 ....))
+So every sub-expression in `foo` has lots and lots of free variables.  But
+none of these sub-expressions float anywhere; the entire float-out pass is a
+no-op.
+
+In lvlMFE, we want to find out quickly if the MFE is not-floatable; that is
+the common case.  In #24471 it turned out that we were testing `abs_vars` (a
+relatively complicated calculation that takes at least O(n-free-vars) time to
+compute) for every sub-expression.
+
+Better instead to test `float_me` early. That still involves looking at
+dest_lvl, which means looking at every free variable, but the constant factor
+is a lot better.
+
+ToDo: find a way to fix the bad asymptotic complexity.
+
 Note [Floating join point bindings]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Mostly we only float a join point if it can /stay/ a join point.  But


=====================================
compiler/GHC/CoreToStg/Prep.hs
=====================================
@@ -1469,8 +1469,7 @@ cpeArg env dmd arg
   = do { (floats1, arg1) <- cpeRhsE env arg     -- arg1 can be a lambda
        ; let arg_ty      = exprType arg1
              is_unlifted = isUnliftedType arg_ty
-             dec         = wantFloatLocal NonRecursive dmd is_unlifted
-                                                  floats1 arg1
+             dec         = wantFloatLocal NonRecursive dmd is_unlifted floats1 arg1
        ; (floats2, arg2) <- executeFloatDecision dec floats1 arg1
                 -- Else case: arg1 might have lambdas, and we can't
                 --            put them inside a wrapBinds
@@ -1482,23 +1481,29 @@ cpeArg env dmd arg
          then return (floats2, arg2)
          else do { v <- newVar arg_ty
                  -- See Note [Eta expansion of arguments in CorePrep]
-                 ; let arg3 = cpeEtaExpandArg env arg2
+                 ; let arity = cpeArgArity env dec arg2
+                       arg3  = cpeEtaExpand arity arg2
                        arg_float = mkNonRecFloat env dmd is_unlifted v arg3
                  ; return (snocFloat floats2 arg_float, varToCoreExpr v) }
        }
 
-cpeEtaExpandArg :: CorePrepEnv -> CoreArg -> CoreArg
+cpeArgArity :: CorePrepEnv -> FloatDecision -> CoreArg -> Arity
 -- ^ See Note [Eta expansion of arguments in CorePrep]
-cpeEtaExpandArg env arg = cpeEtaExpand arity arg
-  where
-    arity | Just ao <- cp_arityOpts (cpe_config env) -- Just <=> -O1 or -O2
-          , not (has_join_in_tail_context arg)
+-- Returning 0 means "no eta-expansion"; see cpeEtaExpand
+cpeArgArity env float_decision arg
+  | FloatNone <- float_decision
+  = 0    -- Crucial short-cut
+         -- See wrinkle (EA2) in Note [Eta expansion of arguments in CorePrep]
+
+  | Just ao <- cp_arityOpts (cpe_config env) -- Just <=> -O1 or -O2
+  , not (has_join_in_tail_context arg)
             -- See Wrinkle (EA1) of Note [Eta expansion of arguments in CorePrep]
-          = case exprEtaExpandArity ao arg of
-              Nothing -> 0
-              Just at -> arityTypeArity at
-          | otherwise
-          = exprArity arg -- this is cheap enough for -O0
+  = case exprEtaExpandArity ao arg of
+      Nothing -> 0
+      Just at -> arityTypeArity at
+
+  | otherwise
+  = exprArity arg -- this is cheap enough for -O0
 
 has_join_in_tail_context :: CoreExpr -> Bool
 -- ^ Identify the cases where we'd generate invalid `CpeApp`s as described in
@@ -1510,34 +1515,10 @@ has_join_in_tail_context (Tick _ e)            = has_join_in_tail_context e
 has_join_in_tail_context (Case _ _ _ alts)     = any has_join_in_tail_context (rhssOfAlts alts)
 has_join_in_tail_context _                     = False
 
-{-
-Note [Eta expansion of arguments with join heads]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-See Note [Eta expansion for join points] in GHC.Core.Opt.Arity
-Eta expanding the join point would introduce crap that we can't
-generate code for
-
-------------------------------------------------------------------------------
--- Building the saturated syntax
--- ---------------------------------------------------------------------------
-
-Note [Eta expansion of hasNoBinding things in CorePrep]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-maybeSaturate deals with eta expanding to saturate things that can't deal with
-unsaturated applications (identified by 'hasNoBinding', currently
-foreign calls, unboxed tuple/sum constructors, and representation-polymorphic
-primitives such as 'coerce' and 'unsafeCoerce#').
-
-Historical Note: Note that eta expansion in CorePrep used to be very fragile
-due to the "prediction" of CAFfyness that we used to make during tidying.
-We previously saturated primop
-applications here as well but due to this fragility (see #16846) we now deal
-with this another way, as described in Note [Primop wrappers] in GHC.Builtin.PrimOps.
--}
-
 maybeSaturate :: Id -> CpeApp -> Int -> [CoreTickish] -> UniqSM CpeRhs
 maybeSaturate fn expr n_args unsat_ticks
   | hasNoBinding fn        -- There's no binding
+    -- See Note [Eta expansion of hasNoBinding things in CorePrep]
   = return $ wrapLamBody (\body -> foldr mkTick body unsat_ticks) sat_expr
 
   | mark_arity > 0 -- A call-by-value function. See Note [CBV Function Ids]
@@ -1567,24 +1548,14 @@ maybeSaturate fn expr n_args unsat_ticks
     fn_arity      = idArity fn
     excess_arity  = (max fn_arity mark_arity) - n_args
     sat_expr      = cpeEtaExpand excess_arity expr
-    applied_marks = n_args >= (length . dropWhile (not . isMarkedCbv) . reverse . expectJust "maybeSaturate" $ (idCbvMarks_maybe fn))
+    applied_marks = n_args >= (length . dropWhile (not . isMarkedCbv) .
+                               reverse . expectJust "maybeSaturate" $ (idCbvMarks_maybe fn))
     -- For join points we never eta-expand (See Note [Do not eta-expand join points])
-    -- so we assert all arguments that need to be passed cbv are visible so that the backend can evalaute them if required..
-{-
-************************************************************************
-*                                                                      *
-                Simple GHC.Core operations
-*                                                                      *
-************************************************************************
--}
+    -- so we assert all arguments that need to be passed cbv are visible so that the
+    -- backend can evalaute them if required..
 
-{-
--- -----------------------------------------------------------------------------
---      Eta reduction
--- -----------------------------------------------------------------------------
-
-Note [Eta expansion]
-~~~~~~~~~~~~~~~~~~~~~
+{- Note [Eta expansion]
+~~~~~~~~~~~~~~~~~~~~~~~
 Eta expand to match the arity claimed by the binder Remember,
 CorePrep must not change arity
 
@@ -1603,6 +1574,19 @@ NB2: we have to be careful that the result of etaExpand doesn't
    an SCC note - we're now careful in etaExpand to make sure the
    SCC is pushed inside any new lambdas that are generated.
 
+Note [Eta expansion of hasNoBinding things in CorePrep]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+maybeSaturate deals with eta expanding to saturate things that can't deal
+with unsaturated applications (identified by 'hasNoBinding', currently
+foreign calls, unboxed tuple/sum constructors, and representation-polymorphic
+primitives such as 'coerce' and 'unsafeCoerce#').
+
+Historical Note: Note that eta expansion in CorePrep used to be very fragile
+due to the "prediction" of CAFfyness that we used to make during tidying.  We
+previously saturated primop applications here as well but due to this
+fragility (see #16846) we now deal with this another way, as described in
+Note [Primop wrappers] in GHC.Builtin.PrimOps.
+
 Note [Eta expansion and the CorePrep invariants]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 It turns out to be much much easier to do eta expansion
@@ -1685,6 +1669,22 @@ There is a nasty Wrinkle:
       This scenario occurs rarely; hence it's OK to generate sub-optimal code.
       The alternative would be to fix Note [Eta expansion for join points], but
       that's quite challenging due to unfoldings of (recursive) join points.
+
+(EA2) In cpeArgArity, if float_decision = FloatNone) the `arg` will look like
+           let <binds> in rhs
+      where <binds> is non-empty and can't be floated out of a lazy context (see
+      `wantFloatLocal`). So we can't eta-expand it anyway, so we can return 0
+      forthwith.  Without this short-cut we will call exprEtaExpandArity on the
+      `arg`, and <binds> might be enormous. exprEtaExpandArity be very expensive
+      on this: it uses arityType, and may look at <binds>.
+
+      On the other hand, if float_decision = FloatAll, there will be no
+      let-bindings around 'arg'; they will have floated out.  So
+      exprEtaExpandArity is cheap.
+
+      This can make a huge difference on deeply nested expressions like
+         f (f (f (f (f  ...))))
+      #24471 is a good example, where Prep took 25% of compile time!
 -}
 
 cpeEtaExpand :: Arity -> CpeRhs -> CpeRhs
@@ -1899,7 +1899,7 @@ instance Outputable FloatInfo where
 -- See Note [Floating in CorePrep]
 -- and Note [BindInfo and FloatInfo]
 data FloatingBind
-  = Float !CoreBind !BindInfo !FloatInfo
+  = Float !CoreBind !BindInfo !FloatInfo    -- Never a join-point binding
   | UnsafeEqualityCase !CoreExpr !CoreBndr !AltCon ![CoreBndr]
   | FloatTick CoreTickish
 
@@ -2126,19 +2126,16 @@ data FloatDecision
   | FloatAll
 
 executeFloatDecision :: FloatDecision -> Floats -> CpeRhs -> UniqSM (Floats, CpeRhs)
-executeFloatDecision dec floats rhs = do
-  let (float,stay) = case dec of
-        _ | isEmptyFloats floats -> (emptyFloats,emptyFloats)
-        FloatNone                -> (emptyFloats, floats)
-        FloatAll                 -> (floats, emptyFloats)
-  -- Wrap `stay` around `rhs`.
-  -- NB: `rhs` might have lambdas, and we can't
-  --     put them inside a wrapBinds, which expects a `CpeBody`.
-  if isEmptyFloats stay -- Fast path where we don't need to call `rhsToBody`
-    then return (float, rhs)
-    else do
-      (floats', body) <- rhsToBody rhs
-      return (float, wrapBinds stay $ wrapBinds floats' body)
+executeFloatDecision dec floats rhs
+  = case dec of
+      FloatAll                 -> return (floats, rhs)
+      FloatNone
+        | isEmptyFloats floats -> return (emptyFloats, rhs)
+        | otherwise            -> do { (floats', body) <- rhsToBody rhs
+                                     ; return (emptyFloats, wrapBinds floats $
+                                                            wrapBinds floats' body) }
+            -- FloatNone case: `rhs` might have lambdas, and we can't
+            -- put them inside a wrapBinds, which expects a `CpeBody`.
 
 wantFloatTop :: Floats -> FloatDecision
 wantFloatTop fs


=====================================
docs/users_guide/9.10.1-notes.rst
=====================================
@@ -232,6 +232,11 @@ Runtime system
 - Add a :rts-flag:`--no-automatic-time-samples` flag which stops time profiling samples being automatically started on
   startup. Time profiling can be controlled manually using functions in ``GHC.Profiling``.
 
+- Add a :rts-flag:`-xr ⟨size⟩` which controls the size of virtual
+  memory address space reserved by the two step allocator on a 64-bit
+  platform. The default size is now 1T on aarch64 as well. See
+  :ghc-ticket:`24498`.
+
 ``base`` library
 ~~~~~~~~~~~~~~~~
 


=====================================
docs/users_guide/runtime_control.rst
=====================================
@@ -368,6 +368,18 @@ Miscellaneous RTS options
     thread can execute its exception handlers. The ``-xq`` controls the
     size of this additional quota.
 
+.. rts-flag:: -xr ⟨size⟩
+
+    :default: 1T
+
+    This option controls the size of virtual memory address space
+    reserved by the two step allocator on a 64-bit platform. It can be
+    useful in scenarios where even reserving a large address range
+    without committing can be expensive (e.g. WSL1), or when you
+    actually have enough physical memory and want to support a Haskell
+    heap larger than 1T. ``-xr`` is a no-op if GHC is configured with
+    ``--disable-large-address-space`` or if the platform is 32-bit.
+
 .. _rts-options-gc:
 
 RTS options to control the garbage collector


=====================================
libraries/base/changelog.md
=====================================
@@ -50,6 +50,10 @@
 
   * Treat all FDs as "nonblocking" on wasm32 ([CLC proposal #234](https://github.com/haskell/core-libraries-committee/issues/234))
 
+  * Add `HeapByEra`, `eraSelector` and `automaticEraIncrement` to `GHC.RTS.Flags` to
+    reflect the new RTS flags: `-he` profiling mode, `-he` selector and `--automatic-era-increment`.
+    ([CLC proposal #254](https://github.com/haskell/core-libraries-committee/issues/254))
+
 ## 4.19.0.0 *October 2023*
   * Add `{-# WARNING in "x-partial" #-}` to `Data.List.{head,tail}`.
     Use `{-# OPTIONS_GHC -Wno-x-partial #-}` to disable it.


=====================================
libraries/ghc-experimental/src/Data/Sum/Experimental.hs
=====================================
@@ -80,4 +80,5 @@ module Data.Sum.Experimental (
   Sum63#,
 ) where
 
+import GHC.Num.BigNat () -- for build ordering
 import GHC.Types


=====================================
libraries/ghc-experimental/src/Data/Tuple/Experimental.hs
=====================================
@@ -161,3 +161,4 @@ module Data.Tuple.Experimental (
 import GHC.Tuple
 import GHC.Types
 import GHC.Classes
+import GHC.Num.BigNat () -- for build ordering


=====================================
libraries/ghc-internal/src/GHC/Internal/RTS/Flags.hsc
=====================================
@@ -274,6 +274,7 @@ data DoHeapProfile
     | HeapByLDV
     | HeapByClosureType
     | HeapByInfoTable
+    | HeapByEra -- ^ @since base-4.20.0.0
     deriving ( Show -- ^ @since base-4.8.0.0
              , Generic -- ^ @since base-4.15.0.0
              )
@@ -289,6 +290,7 @@ instance Enum DoHeapProfile where
     fromEnum HeapByLDV         = #{const HEAP_BY_LDV}
     fromEnum HeapByClosureType = #{const HEAP_BY_CLOSURE_TYPE}
     fromEnum HeapByInfoTable   = #{const HEAP_BY_INFO_TABLE}
+    fromEnum HeapByEra         = #{const HEAP_BY_ERA}
 
     toEnum #{const NO_HEAP_PROFILING}    = NoHeapProfiling
     toEnum #{const HEAP_BY_CCS}          = HeapByCCS
@@ -299,6 +301,7 @@ instance Enum DoHeapProfile where
     toEnum #{const HEAP_BY_LDV}          = HeapByLDV
     toEnum #{const HEAP_BY_CLOSURE_TYPE} = HeapByClosureType
     toEnum #{const HEAP_BY_INFO_TABLE}   = HeapByInfoTable
+    toEnum #{const HEAP_BY_ERA}          = HeapByEra
     toEnum e = errorWithoutStackTrace ("invalid enum for DoHeapProfile: " ++ show e)
 
 -- | Parameters of the cost-center profiler
@@ -311,6 +314,7 @@ data ProfFlags = ProfFlags
     , startHeapProfileAtStartup :: Bool
     , startTimeProfileAtStartup :: Bool   -- ^ @since base-4.20.0.0
     , showCCSOnException       :: Bool
+    , automaticEraIncrement    :: Bool   -- ^ @since 4.20.0.0
     , maxRetainerSetSize       :: Word
     , ccsLength                :: Word
     , modSelector              :: Maybe String
@@ -320,6 +324,7 @@ data ProfFlags = ProfFlags
     , ccsSelector              :: Maybe String
     , retainerSelector         :: Maybe String
     , bioSelector              :: Maybe String
+    , eraSelector              :: Word -- ^ @since base-4.20.0.0
     } deriving ( Show -- ^ @since base-4.8.0.0
                , Generic -- ^ @since base-4.15.0.0
                )
@@ -633,6 +638,8 @@ getProfFlags = do
                   (#{peek PROFILING_FLAGS, startTimeProfileAtStartup} ptr :: IO CBool))
             <*> (toBool <$>
                   (#{peek PROFILING_FLAGS, showCCSOnException} ptr :: IO CBool))
+            <*> (toBool <$>
+                  (#{peek PROFILING_FLAGS, incrementUserEra} ptr :: IO CBool))
             <*> #{peek PROFILING_FLAGS, maxRetainerSetSize} ptr
             <*> #{peek PROFILING_FLAGS, ccsLength} ptr
             <*> (peekCStringOpt =<< #{peek PROFILING_FLAGS, modSelector} ptr)
@@ -642,6 +649,7 @@ getProfFlags = do
             <*> (peekCStringOpt =<< #{peek PROFILING_FLAGS, ccsSelector} ptr)
             <*> (peekCStringOpt =<< #{peek PROFILING_FLAGS, retainerSelector} ptr)
             <*> (peekCStringOpt =<< #{peek PROFILING_FLAGS, bioSelector} ptr)
+            <*> #{peek PROFILING_FLAGS, eraSelector} ptr
 
 getTraceFlags :: IO TraceFlags
 getTraceFlags = do


=====================================
rts/RtsFlags.c
=====================================
@@ -186,6 +186,9 @@ void initRtsFlagsDefaults(void)
     RtsFlags.GcFlags.ringBell           = false;
     RtsFlags.GcFlags.longGCSync         = 0; /* detection turned off */
 
+    // 1 TBytes
+    RtsFlags.GcFlags.addressSpaceSize   = (StgWord64)1 << 40;
+
     RtsFlags.DebugFlags.scheduler       = false;
     RtsFlags.DebugFlags.interpreter     = false;
     RtsFlags.DebugFlags.weak            = false;
@@ -552,6 +555,11 @@ usage_text[] = {
 "  -xq        The allocation limit given to a thread after it receives",
 "             an AllocationLimitExceeded exception. (default: 100k)",
 "",
+#if defined(USE_LARGE_ADDRESS_SPACE)
+"  -xr        The size of virtual memory address space reserved by the",
+"             two step allocator (default: 1T)",
+"",
+#endif
 "  -Mgrace=<n>",
 "             The amount of allocation after the program receives a",
 "             HeapOverflow exception before the exception is thrown again, if",
@@ -1820,6 +1828,12 @@ error = true;
                           / BLOCK_SIZE;
                   break;
 
+                case 'r':
+                    OPTION_UNSAFE;
+                    RtsFlags.GcFlags.addressSpaceSize
+                      = decodeSize(rts_argv[arg], 3, MBLOCK_SIZE, HS_WORD64_MAX);
+                    break;
+
                   default:
                     OPTION_SAFE;
                     errorBelch("unknown RTS option: %s",rts_argv[arg]);
@@ -2118,7 +2132,9 @@ decodeSize(const char *flag, uint32_t offset, StgWord64 min, StgWord64 max)
         m = atof(s);
         c = s[strlen(s)-1];
 
-        if (c == 'g' || c == 'G')
+        if (c == 't' || c == 'T')
+            m *= (StgWord64)1024*1024*1024*1024;
+        else if (c == 'g' || c == 'G')
             m *= 1024*1024*1024;
         else if (c == 'm' || c == 'M')
             m *= 1024*1024;
@@ -2737,4 +2753,3 @@ doingErasProfiling( void )
             || RtsFlags.ProfFlags.eraSelector != 0);
 }
 #endif /* PROFILING */
-


=====================================
rts/include/rts/Flags.h
=====================================
@@ -89,6 +89,8 @@ typedef struct _GC_FLAGS {
 
     bool numa;                   /* Use NUMA */
     StgWord numaMask;
+
+    StgWord64 addressSpaceSize;  /* large address space size in bytes */
 } GC_FLAGS;
 
 /* See Note [Synchronization of flags and base APIs] */


=====================================
rts/js/arith.js
=====================================
@@ -44,19 +44,23 @@ function h$hs_remWord64(h1,l1,h2,l2) {
 }
 
 function h$hs_timesWord64(h1,l1,h2,l2) {
-  var a = W64(h1,l1);
-  var b = W64(h2,l2);
-  var r = BigInt.asUintN(64, a * b);
-  TRACE_ARITH("Word64: " + a + " * " + b + " ==> " + r)
-  RETURN_W64(r);
+  var rh = h$mul2Word32(l1,l2);
+  var rl = h$ret1;
+
+  rh += Math.imul(l1,h2)>>>0;
+  rh += Math.imul(l2,h1)>>>0;
+  rh >>>= 0;
+
+  TRACE_ARITH("Word64: " + (h1,l1) + " * " + (h2,l2) + " ==> " + (rh,rl))
+  RETURN_UBX_TUP2(rh,rl);
 }
 
 function h$hs_minusWord64(h1,l1,h2,l2) {
-  var a = (BigInt(h1) << BigInt(32)) | BigInt(l1>>>0);
-  var b = (BigInt(h2) << BigInt(32)) | BigInt(l2>>>0);
-  var r = BigInt.asUintN(64, a - b);
-  TRACE_ARITH("Word64: " + a + " - " + b + " ==> " + r)
-  RETURN_W64(r);
+  var l  = l1-l2;
+  var rl = l>>>0;
+  var rh = (h1-h2-(l!=rl?1:0))>>>0;
+  TRACE_ARITH("Word64: " + (h1,l1) + " - " + (h2,l2) + " ==> " + (rh,rl))
+  RETURN_UBX_TUP2(rh,rl);
 }
 
 function h$hs_plusWord64(h1,l1,h2,l2) {
@@ -68,11 +72,15 @@ function h$hs_plusWord64(h1,l1,h2,l2) {
 }
 
 function h$hs_timesInt64(h1,l1,h2,l2) {
-  var a = I64(h1,l1);
-  var b = I64(h2,l2);
-  var r = BigInt.asIntN(64, a * b);
-  TRACE_ARITH("Int64: " + a + " * " + b + " ==> " + r)
-  RETURN_I64(r);
+  var rh = h$mul2Word32(l1,l2);
+  var rl = h$ret1;
+
+  rh += Math.imul(l1,h2)|0;
+  rh += Math.imul(l2,h1)|0;
+  rh |= 0;
+
+  TRACE_ARITH("Int64: " + (h1,l1) + " * " + (h2,l2) + " ==> " + (rh,rl))
+  RETURN_UBX_TUP2(rh,rl);
 }
 
 function h$hs_quotInt64(h1,l1,h2,l2) {
@@ -92,19 +100,19 @@ function h$hs_remInt64(h1,l1,h2,l2) {
 }
 
 function h$hs_plusInt64(h1,l1,h2,l2) {
-  var a = I64(h1,l1);
-  var b = I64(h2,l2);
-  var r = BigInt.asIntN(64, a + b);
-  TRACE_ARITH("Int64: " + a + " + " + b + " ==> " + r)
-  RETURN_I64(r);
+  var l  = l1+l2;
+  var rl = l>>>0;
+  var rh = (h1+h2+(l!=rl?1:0))|0;
+  TRACE_ARITH("Int64: " + (h1,l1) + " + " + (h2,l2) + " ==> " + (rh,rl))
+  RETURN_UBX_TUP2(rh,rl);
 }
 
 function h$hs_minusInt64(h1,l1,h2,l2) {
-  var a = I64(h1,l1);
-  var b = I64(h2,l2);
-  var r = BigInt.asIntN(64, a - b);
-  TRACE_ARITH("Int64: " + a + " - " + b + " ==> " + r)
-  RETURN_I64(r);
+  var l  = l1-l2;
+  var rl = l>>>0;
+  var rh = (h1-h2-(l!=rl?1:0))|0;
+  TRACE_ARITH("Int64: " + (h1,l1) + " - " + (h2,l2) + " ==> " + (rh,rl))
+  RETURN_UBX_TUP2(rh,rl);
 }
 
 function h$hs_uncheckedShiftLWord64(h,l,n) {


=====================================
rts/sm/MBlock.c
=====================================
@@ -659,20 +659,14 @@ initMBlocks(void)
 
 #if defined(USE_LARGE_ADDRESS_SPACE)
     {
-        W_ size;
-#if defined(aarch64_HOST_ARCH)
-        size = (W_)1 << 38; // 1/4 TByte
-#else
-        size = (W_)1 << 40; // 1 TByte
-#endif
         void *startAddress = NULL;
         if (RtsFlags.GcFlags.heapBase) {
             startAddress = (void*) RtsFlags.GcFlags.heapBase;
         }
-        void *addr = osReserveHeapMemory(startAddress, &size);
+        void *addr = osReserveHeapMemory(startAddress, &RtsFlags.GcFlags.addressSpaceSize);
 
         mblock_address_space.begin = (W_)addr;
-        mblock_address_space.end = (W_)addr + size;
+        mblock_address_space.end = (W_)addr + RtsFlags.GcFlags.addressSpaceSize;
         mblock_high_watermark = (W_)addr;
     }
 #elif SIZEOF_VOID_P == 8


=====================================
testsuite/tests/interface-stability/base-exports.stdout
=====================================
@@ -9029,7 +9029,7 @@ module GHC.RTS.Flags where
   type DoCostCentres :: *
   data DoCostCentres = CostCentresNone | CostCentresSummary | CostCentresVerbose | CostCentresAll | CostCentresJSON
   type DoHeapProfile :: *
-  data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable
+  data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable | HeapByEra
   type DoTrace :: *
   data DoTrace = TraceNone | TraceEventLog | TraceStderr
   type GCFlags :: *
@@ -9080,6 +9080,7 @@ module GHC.RTS.Flags where
                  startHeapProfileAtStartup :: GHC.Types.Bool,
                  startTimeProfileAtStartup :: GHC.Types.Bool,
                  showCCSOnException :: GHC.Types.Bool,
+                 automaticEraIncrement :: GHC.Types.Bool,
                  maxRetainerSetSize :: GHC.Types.Word,
                  ccsLength :: GHC.Types.Word,
                  modSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
@@ -9088,7 +9089,8 @@ module GHC.RTS.Flags where
                  ccSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
                  ccsSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
                  retainerSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
-                 bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String}
+                 bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
+                 eraSelector :: GHC.Types.Word}
   type RTSFlags :: *
   data RTSFlags = RTSFlags {gcFlags :: GCFlags, concurrentFlags :: ConcFlags, miscFlags :: MiscFlags, debugFlags :: DebugFlags, costCentreFlags :: CCFlags, profilingFlags :: ProfFlags, traceFlags :: TraceFlags, tickyFlags :: TickyFlags, parFlags :: ParFlags, hpcFlags :: HpcFlags}
   type RtsTime :: *


=====================================
testsuite/tests/interface-stability/base-exports.stdout-javascript-unknown-ghcjs
=====================================
@@ -12071,7 +12071,7 @@ module GHC.RTS.Flags where
   type DoCostCentres :: *
   data DoCostCentres = CostCentresNone | CostCentresSummary | CostCentresVerbose | CostCentresAll | CostCentresJSON
   type DoHeapProfile :: *
-  data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable
+  data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable | HeapByEra
   type DoTrace :: *
   data DoTrace = TraceNone | TraceEventLog | TraceStderr
   type GCFlags :: *
@@ -12122,6 +12122,7 @@ module GHC.RTS.Flags where
                  startHeapProfileAtStartup :: GHC.Types.Bool,
                  startTimeProfileAtStartup :: GHC.Types.Bool,
                  showCCSOnException :: GHC.Types.Bool,
+                 automaticEraIncrement :: GHC.Types.Bool,
                  maxRetainerSetSize :: GHC.Types.Word,
                  ccsLength :: GHC.Types.Word,
                  modSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
@@ -12130,7 +12131,8 @@ module GHC.RTS.Flags where
                  ccSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
                  ccsSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
                  retainerSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
-                 bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String}
+                 bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
+                 eraSelector :: GHC.Types.Word}
   type RTSFlags :: *
   data RTSFlags = RTSFlags {gcFlags :: GCFlags, concurrentFlags :: ConcFlags, miscFlags :: MiscFlags, debugFlags :: DebugFlags, costCentreFlags :: CCFlags, profilingFlags :: ProfFlags, traceFlags :: TraceFlags, tickyFlags :: TickyFlags, parFlags :: ParFlags, hpcFlags :: HpcFlags}
   type RtsTime :: *


=====================================
testsuite/tests/interface-stability/base-exports.stdout-mingw32
=====================================
@@ -9253,7 +9253,7 @@ module GHC.RTS.Flags where
   type DoCostCentres :: *
   data DoCostCentres = CostCentresNone | CostCentresSummary | CostCentresVerbose | CostCentresAll | CostCentresJSON
   type DoHeapProfile :: *
-  data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable
+  data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable | HeapByEra
   type DoTrace :: *
   data DoTrace = TraceNone | TraceEventLog | TraceStderr
   type GCFlags :: *
@@ -9304,6 +9304,7 @@ module GHC.RTS.Flags where
                  startHeapProfileAtStartup :: GHC.Types.Bool,
                  startTimeProfileAtStartup :: GHC.Types.Bool,
                  showCCSOnException :: GHC.Types.Bool,
+                 automaticEraIncrement :: GHC.Types.Bool,
                  maxRetainerSetSize :: GHC.Types.Word,
                  ccsLength :: GHC.Types.Word,
                  modSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
@@ -9312,7 +9313,8 @@ module GHC.RTS.Flags where
                  ccSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
                  ccsSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
                  retainerSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
-                 bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String}
+                 bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
+                 eraSelector :: GHC.Types.Word}
   type RTSFlags :: *
   data RTSFlags = RTSFlags {gcFlags :: GCFlags, concurrentFlags :: ConcFlags, miscFlags :: MiscFlags, debugFlags :: DebugFlags, costCentreFlags :: CCFlags, profilingFlags :: ProfFlags, traceFlags :: TraceFlags, tickyFlags :: TickyFlags, parFlags :: ParFlags, hpcFlags :: HpcFlags}
   type RtsTime :: *


=====================================
testsuite/tests/interface-stability/base-exports.stdout-ws-32
=====================================
@@ -9029,7 +9029,7 @@ module GHC.RTS.Flags where
   type DoCostCentres :: *
   data DoCostCentres = CostCentresNone | CostCentresSummary | CostCentresVerbose | CostCentresAll | CostCentresJSON
   type DoHeapProfile :: *
-  data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable
+  data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable | HeapByEra
   type DoTrace :: *
   data DoTrace = TraceNone | TraceEventLog | TraceStderr
   type GCFlags :: *
@@ -9080,6 +9080,7 @@ module GHC.RTS.Flags where
                  startHeapProfileAtStartup :: GHC.Types.Bool,
                  startTimeProfileAtStartup :: GHC.Types.Bool,
                  showCCSOnException :: GHC.Types.Bool,
+                 automaticEraIncrement :: GHC.Types.Bool,
                  maxRetainerSetSize :: GHC.Types.Word,
                  ccsLength :: GHC.Types.Word,
                  modSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
@@ -9088,7 +9089,8 @@ module GHC.RTS.Flags where
                  ccSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
                  ccsSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
                  retainerSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
-                 bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String}
+                 bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
+                 eraSelector :: GHC.Types.Word}
   type RTSFlags :: *
   data RTSFlags = RTSFlags {gcFlags :: GCFlags, concurrentFlags :: ConcFlags, miscFlags :: MiscFlags, debugFlags :: DebugFlags, costCentreFlags :: CCFlags, profilingFlags :: ProfFlags, traceFlags :: TraceFlags, tickyFlags :: TickyFlags, parFlags :: ParFlags, hpcFlags :: HpcFlags}
   type RtsTime :: *


=====================================
testsuite/tests/perf/compiler/T24471.hs
=====================================
@@ -0,0 +1,8 @@
+{-# LANGUAGE TemplateHaskell #-}
+module T24471 where
+
+import T24471a
+
+{-# OPAQUE foo #-}
+foo :: (List_ Int a -> a) -> a
+foo alg = $$(between [|| alg ||] 0 1000)


=====================================
testsuite/tests/perf/compiler/T24471a.hs
=====================================
@@ -0,0 +1,8 @@
+{-# LANGUAGE TemplateHaskell #-}
+module T24471a where
+
+data List_ a f = Nil_ | Cons_ a f deriving Functor
+
+between alg a b
+  | a == b = [|| $$alg Nil_ ||]
+  | otherwise = [|| $$alg (Cons_ a $$(between alg (a + 1) b)) ||]


=====================================
testsuite/tests/perf/compiler/all.T
=====================================
@@ -712,3 +712,7 @@ test ('LookupFusion',
       [collect_stats('bytes allocated',2), when(wordsize(32), skip)],
       compile_and_run,
       ['-O2 -package base'])
+
+test('T24471',
+     [ req_th, collect_compiler_stats('all', 5) ],
+     multimod_compile, ['T24471', '-v0 -O'])


=====================================
testsuite/tests/rts/all.T
=====================================
@@ -3,7 +3,7 @@ test('testblockalloc',
      compile_and_run, [''])
 
 test('testmblockalloc',
-     [c_src, only_ways(['normal','threaded1']), extra_run_opts('+RTS -I0'),
+     [c_src, only_ways(['normal','threaded1']), extra_run_opts('+RTS -I0 -xr0.125T'),
       when(arch('wasm32'), skip)], # MBlocks can't be freed on wasm32, see Note [Megablock allocator on wasm] in rts
      compile_and_run, [''])
 # -I0 is important: the idle GC will run the memory leak detector,



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/8d1384b1156be380b3d64eefbba4de756cd3f366...d8161acc13d72330d54c9176bb22ff6336ee9ee1

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/8d1384b1156be380b3d64eefbba4de756cd3f366...d8161acc13d72330d54c9176bb22ff6336ee9ee1
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240306/4704458e/attachment-0001.html>


More information about the ghc-commits mailing list