[Git][ghc/ghc][wip/T24471] 5 commits: base: Reflect new era profiling RTS flags in GHC.RTS.Flags
Cheng Shao (@TerrorJack)
gitlab at gitlab.haskell.org
Wed Mar 6 06:14:23 UTC 2024
Cheng Shao pushed to branch wip/T24471 at Glasgow Haskell Compiler / GHC
Commits:
4074a3f2 by Matthew Pickering at 2024-03-05T21:44:35-05:00
base: Reflect new era profiling RTS flags in GHC.RTS.Flags
* -he profiling mode
* -he profiling selector
* --automatic-era-increment
CLC proposal #254 - https://github.com/haskell/core-libraries-committee/issues/254
- - - - -
a8c0e31b by Sylvain Henry at 2024-03-05T21:45:14-05:00
JS: faster implementation for some numeric primitives (#23597)
Use faster implementations for the following primitives in the JS
backend by not using JavaScript's BigInt:
- plusInt64
- minusInt64
- minusWord64
- timesWord64
- timesInt64
Co-authored-by: Josh Meredith <joshmeredith2008 at gmail.com>
- - - - -
21e3f325 by Cheng Shao at 2024-03-05T21:45:52-05:00
rts: add -xr option to control two step allocator reserved space size
This patch adds a -xr RTS option to control the size of virtual memory
address space reserved by the two step allocator on a 64-bit platform,
see added documentation for explanation. Closes #24498.
- - - - -
74cb79b1 by Ben Gamari at 2024-03-06T06:12:17+00:00
ghc-experimental: Add dummy dependencies to work around #23942
This is a temporary measure to improve CI reliability until a proper
solution is developed.
Works around #23942.
- - - - -
d8161acc by Simon Peyton Jones at 2024-03-06T06:13:48+00:00
Three compile perf improvements with deep nesting
These were changes are all triggered by #24471.
1. Make GHC.Core.Opt.SetLevels.lvlMFE behave better when there are
many free variables. See Note [Large free-variable sets].
2. Make GHC.Core.Opt.Arity.floatIn a bit lazier in its Cost argument.
This benefits the common case where the ArityType turns out to
be nullary. See Note [Care with nested expressions]
3. Make GHC.CoreToStg.Prep.cpeArg behave for deeply-nested
expressions. See Note [Eta expansion of arguments in CorePrep]
wrinkle (EA2).
Compile times go down by up to 4.5%, and much more in artificial
cases. (Geo mean of compiler/perf changes is -0.4%.)
Metric Decrease:
CoOpt_Read
T10421
T12425
- - - - -
21 changed files:
- compiler/GHC/Core/Opt/Arity.hs
- compiler/GHC/Core/Opt/SetLevels.hs
- compiler/GHC/CoreToStg/Prep.hs
- docs/users_guide/9.10.1-notes.rst
- docs/users_guide/runtime_control.rst
- libraries/base/changelog.md
- libraries/ghc-experimental/src/Data/Sum/Experimental.hs
- libraries/ghc-experimental/src/Data/Tuple/Experimental.hs
- libraries/ghc-internal/src/GHC/Internal/RTS/Flags.hsc
- rts/RtsFlags.c
- rts/include/rts/Flags.h
- rts/js/arith.js
- rts/sm/MBlock.c
- testsuite/tests/interface-stability/base-exports.stdout
- testsuite/tests/interface-stability/base-exports.stdout-javascript-unknown-ghcjs
- testsuite/tests/interface-stability/base-exports.stdout-mingw32
- testsuite/tests/interface-stability/base-exports.stdout-ws-32
- + testsuite/tests/perf/compiler/T24471.hs
- + testsuite/tests/perf/compiler/T24471a.hs
- testsuite/tests/perf/compiler/all.T
- testsuite/tests/rts/all.T
Changes:
=====================================
compiler/GHC/Core/Opt/Arity.hs
=====================================
@@ -1270,8 +1270,14 @@ arityLam id (AT oss div)
floatIn :: Cost -> ArityType -> ArityType
-- We have something like (let x = E in b),
-- where b has the given arity type.
-floatIn IsCheap at = at
-floatIn IsExpensive at = addWork at
+-- NB: be as lazy as possible in the Cost-of-E argument;
+-- we can often get away without ever looking at it
+-- See Note [Care with nested expressions]
+floatIn ch at@(AT lams div)
+ = case lams of
+ [] -> at
+ (IsExpensive,_):_ -> at
+ (_,os):lams -> AT ((ch,os):lams) div
addWork :: ArityType -> ArityType
-- Add work to the outermost level of the arity type
@@ -1354,6 +1360,25 @@ That gives \1.T (see Note [Combining case branches: andWithTail],
first bullet). So 'go2' gets an arityType of \(C?)(C1).T, which is
what we want.
+Note [Care with nested expressions]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider
+ arityType (Just <big-expressions>)
+We will take
+ arityType Just = AT [(IsCheap,os)] topDiv
+and then do
+ arityApp (AT [(IsCheap os)] topDiv) (exprCost <big-expression>)
+The result will be AT [] topDiv. It doesn't matter what <big-expresison>
+is! The same is true of
+ arityType (let x = <rhs> in <body>)
+where the cost of <rhs> doesn't matter unless <body> has a useful
+arityType.
+
+TL;DR in `floatIn`, do not to look at the Cost argument until you have to.
+
+I found this when looking at #24471, although I don't think it was really
+the main culprit.
+
Note [Combining case branches: andWithTail]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When combining the ArityTypes for two case branches (with andArityType)
@@ -1576,7 +1601,7 @@ arityType env (Case scrut bndr _ alts)
= alts_type
| otherwise -- In the remaining cases we may not push
- = addWork alts_type -- evaluation of the scrutinee in
+ = addWork alts_type -- evaluation of the scrutinee in
where
env' = delInScope env bndr
arity_type_alt (Alt _con bndrs rhs) = arityType (delInScopeList env' bndrs) rhs
=====================================
compiler/GHC/Core/Opt/SetLevels.hs
=====================================
@@ -643,7 +643,8 @@ lvlMFE env strict_ctxt e@(_, AnnCase {})
= lvlExpr env e -- See Note [Case MFEs]
lvlMFE env strict_ctxt ann_expr
- | floatTopLvlOnly env && not (isTopLvl dest_lvl)
+ | not float_me
+ || floatTopLvlOnly env && not (isTopLvl dest_lvl)
-- Only floating to the top level is allowed.
|| hasFreeJoin env fvs -- If there is a free join, don't float
-- See Note [Free join points]
@@ -652,8 +653,9 @@ lvlMFE env strict_ctxt ann_expr
-- how it will be represented at runtime.
-- See Note [Representation polymorphism invariants] in GHC.Core
|| notWorthFloating expr abs_vars
- || not float_me
- = -- Don't float it out
+ -- Test notWorhtFloating last;
+ -- See Note [Large free-variable sets]
+ = -- Don't float it out
lvlExpr env ann_expr
| float_is_new_lam || exprIsTopLevelBindable expr expr_ty
@@ -822,6 +824,28 @@ early loses opportunities for RULES which (needless to say) are
important in some nofib programs (gcd is an example). [SPJ note:
I think this is obsolete; the flag seems always on.]
+Note [Large free-variable sets]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In #24471 we had something like
+ x1 = I# 1
+ ...
+ x1000 = I# 1000
+ foo = f x1 (f x2 (f x3 ....))
+So every sub-expression in `foo` has lots and lots of free variables. But
+none of these sub-expressions float anywhere; the entire float-out pass is a
+no-op.
+
+In lvlMFE, we want to find out quickly if the MFE is not-floatable; that is
+the common case. In #24471 it turned out that we were testing `abs_vars` (a
+relatively complicated calculation that takes at least O(n-free-vars) time to
+compute) for every sub-expression.
+
+Better instead to test `float_me` early. That still involves looking at
+dest_lvl, which means looking at every free variable, but the constant factor
+is a lot better.
+
+ToDo: find a way to fix the bad asymptotic complexity.
+
Note [Floating join point bindings]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Mostly we only float a join point if it can /stay/ a join point. But
=====================================
compiler/GHC/CoreToStg/Prep.hs
=====================================
@@ -1469,8 +1469,7 @@ cpeArg env dmd arg
= do { (floats1, arg1) <- cpeRhsE env arg -- arg1 can be a lambda
; let arg_ty = exprType arg1
is_unlifted = isUnliftedType arg_ty
- dec = wantFloatLocal NonRecursive dmd is_unlifted
- floats1 arg1
+ dec = wantFloatLocal NonRecursive dmd is_unlifted floats1 arg1
; (floats2, arg2) <- executeFloatDecision dec floats1 arg1
-- Else case: arg1 might have lambdas, and we can't
-- put them inside a wrapBinds
@@ -1482,23 +1481,29 @@ cpeArg env dmd arg
then return (floats2, arg2)
else do { v <- newVar arg_ty
-- See Note [Eta expansion of arguments in CorePrep]
- ; let arg3 = cpeEtaExpandArg env arg2
+ ; let arity = cpeArgArity env dec arg2
+ arg3 = cpeEtaExpand arity arg2
arg_float = mkNonRecFloat env dmd is_unlifted v arg3
; return (snocFloat floats2 arg_float, varToCoreExpr v) }
}
-cpeEtaExpandArg :: CorePrepEnv -> CoreArg -> CoreArg
+cpeArgArity :: CorePrepEnv -> FloatDecision -> CoreArg -> Arity
-- ^ See Note [Eta expansion of arguments in CorePrep]
-cpeEtaExpandArg env arg = cpeEtaExpand arity arg
- where
- arity | Just ao <- cp_arityOpts (cpe_config env) -- Just <=> -O1 or -O2
- , not (has_join_in_tail_context arg)
+-- Returning 0 means "no eta-expansion"; see cpeEtaExpand
+cpeArgArity env float_decision arg
+ | FloatNone <- float_decision
+ = 0 -- Crucial short-cut
+ -- See wrinkle (EA2) in Note [Eta expansion of arguments in CorePrep]
+
+ | Just ao <- cp_arityOpts (cpe_config env) -- Just <=> -O1 or -O2
+ , not (has_join_in_tail_context arg)
-- See Wrinkle (EA1) of Note [Eta expansion of arguments in CorePrep]
- = case exprEtaExpandArity ao arg of
- Nothing -> 0
- Just at -> arityTypeArity at
- | otherwise
- = exprArity arg -- this is cheap enough for -O0
+ = case exprEtaExpandArity ao arg of
+ Nothing -> 0
+ Just at -> arityTypeArity at
+
+ | otherwise
+ = exprArity arg -- this is cheap enough for -O0
has_join_in_tail_context :: CoreExpr -> Bool
-- ^ Identify the cases where we'd generate invalid `CpeApp`s as described in
@@ -1510,34 +1515,10 @@ has_join_in_tail_context (Tick _ e) = has_join_in_tail_context e
has_join_in_tail_context (Case _ _ _ alts) = any has_join_in_tail_context (rhssOfAlts alts)
has_join_in_tail_context _ = False
-{-
-Note [Eta expansion of arguments with join heads]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-See Note [Eta expansion for join points] in GHC.Core.Opt.Arity
-Eta expanding the join point would introduce crap that we can't
-generate code for
-
-------------------------------------------------------------------------------
--- Building the saturated syntax
--- ---------------------------------------------------------------------------
-
-Note [Eta expansion of hasNoBinding things in CorePrep]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-maybeSaturate deals with eta expanding to saturate things that can't deal with
-unsaturated applications (identified by 'hasNoBinding', currently
-foreign calls, unboxed tuple/sum constructors, and representation-polymorphic
-primitives such as 'coerce' and 'unsafeCoerce#').
-
-Historical Note: Note that eta expansion in CorePrep used to be very fragile
-due to the "prediction" of CAFfyness that we used to make during tidying.
-We previously saturated primop
-applications here as well but due to this fragility (see #16846) we now deal
-with this another way, as described in Note [Primop wrappers] in GHC.Builtin.PrimOps.
--}
-
maybeSaturate :: Id -> CpeApp -> Int -> [CoreTickish] -> UniqSM CpeRhs
maybeSaturate fn expr n_args unsat_ticks
| hasNoBinding fn -- There's no binding
+ -- See Note [Eta expansion of hasNoBinding things in CorePrep]
= return $ wrapLamBody (\body -> foldr mkTick body unsat_ticks) sat_expr
| mark_arity > 0 -- A call-by-value function. See Note [CBV Function Ids]
@@ -1567,24 +1548,14 @@ maybeSaturate fn expr n_args unsat_ticks
fn_arity = idArity fn
excess_arity = (max fn_arity mark_arity) - n_args
sat_expr = cpeEtaExpand excess_arity expr
- applied_marks = n_args >= (length . dropWhile (not . isMarkedCbv) . reverse . expectJust "maybeSaturate" $ (idCbvMarks_maybe fn))
+ applied_marks = n_args >= (length . dropWhile (not . isMarkedCbv) .
+ reverse . expectJust "maybeSaturate" $ (idCbvMarks_maybe fn))
-- For join points we never eta-expand (See Note [Do not eta-expand join points])
- -- so we assert all arguments that need to be passed cbv are visible so that the backend can evalaute them if required..
-{-
-************************************************************************
-* *
- Simple GHC.Core operations
-* *
-************************************************************************
--}
+ -- so we assert all arguments that need to be passed cbv are visible so that the
+ -- backend can evalaute them if required..
-{-
--- -----------------------------------------------------------------------------
--- Eta reduction
--- -----------------------------------------------------------------------------
-
-Note [Eta expansion]
-~~~~~~~~~~~~~~~~~~~~~
+{- Note [Eta expansion]
+~~~~~~~~~~~~~~~~~~~~~~~
Eta expand to match the arity claimed by the binder Remember,
CorePrep must not change arity
@@ -1603,6 +1574,19 @@ NB2: we have to be careful that the result of etaExpand doesn't
an SCC note - we're now careful in etaExpand to make sure the
SCC is pushed inside any new lambdas that are generated.
+Note [Eta expansion of hasNoBinding things in CorePrep]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+maybeSaturate deals with eta expanding to saturate things that can't deal
+with unsaturated applications (identified by 'hasNoBinding', currently
+foreign calls, unboxed tuple/sum constructors, and representation-polymorphic
+primitives such as 'coerce' and 'unsafeCoerce#').
+
+Historical Note: Note that eta expansion in CorePrep used to be very fragile
+due to the "prediction" of CAFfyness that we used to make during tidying. We
+previously saturated primop applications here as well but due to this
+fragility (see #16846) we now deal with this another way, as described in
+Note [Primop wrappers] in GHC.Builtin.PrimOps.
+
Note [Eta expansion and the CorePrep invariants]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It turns out to be much much easier to do eta expansion
@@ -1685,6 +1669,22 @@ There is a nasty Wrinkle:
This scenario occurs rarely; hence it's OK to generate sub-optimal code.
The alternative would be to fix Note [Eta expansion for join points], but
that's quite challenging due to unfoldings of (recursive) join points.
+
+(EA2) In cpeArgArity, if float_decision = FloatNone) the `arg` will look like
+ let <binds> in rhs
+ where <binds> is non-empty and can't be floated out of a lazy context (see
+ `wantFloatLocal`). So we can't eta-expand it anyway, so we can return 0
+ forthwith. Without this short-cut we will call exprEtaExpandArity on the
+ `arg`, and <binds> might be enormous. exprEtaExpandArity be very expensive
+ on this: it uses arityType, and may look at <binds>.
+
+ On the other hand, if float_decision = FloatAll, there will be no
+ let-bindings around 'arg'; they will have floated out. So
+ exprEtaExpandArity is cheap.
+
+ This can make a huge difference on deeply nested expressions like
+ f (f (f (f (f ...))))
+ #24471 is a good example, where Prep took 25% of compile time!
-}
cpeEtaExpand :: Arity -> CpeRhs -> CpeRhs
@@ -1899,7 +1899,7 @@ instance Outputable FloatInfo where
-- See Note [Floating in CorePrep]
-- and Note [BindInfo and FloatInfo]
data FloatingBind
- = Float !CoreBind !BindInfo !FloatInfo
+ = Float !CoreBind !BindInfo !FloatInfo -- Never a join-point binding
| UnsafeEqualityCase !CoreExpr !CoreBndr !AltCon ![CoreBndr]
| FloatTick CoreTickish
@@ -2126,19 +2126,16 @@ data FloatDecision
| FloatAll
executeFloatDecision :: FloatDecision -> Floats -> CpeRhs -> UniqSM (Floats, CpeRhs)
-executeFloatDecision dec floats rhs = do
- let (float,stay) = case dec of
- _ | isEmptyFloats floats -> (emptyFloats,emptyFloats)
- FloatNone -> (emptyFloats, floats)
- FloatAll -> (floats, emptyFloats)
- -- Wrap `stay` around `rhs`.
- -- NB: `rhs` might have lambdas, and we can't
- -- put them inside a wrapBinds, which expects a `CpeBody`.
- if isEmptyFloats stay -- Fast path where we don't need to call `rhsToBody`
- then return (float, rhs)
- else do
- (floats', body) <- rhsToBody rhs
- return (float, wrapBinds stay $ wrapBinds floats' body)
+executeFloatDecision dec floats rhs
+ = case dec of
+ FloatAll -> return (floats, rhs)
+ FloatNone
+ | isEmptyFloats floats -> return (emptyFloats, rhs)
+ | otherwise -> do { (floats', body) <- rhsToBody rhs
+ ; return (emptyFloats, wrapBinds floats $
+ wrapBinds floats' body) }
+ -- FloatNone case: `rhs` might have lambdas, and we can't
+ -- put them inside a wrapBinds, which expects a `CpeBody`.
wantFloatTop :: Floats -> FloatDecision
wantFloatTop fs
=====================================
docs/users_guide/9.10.1-notes.rst
=====================================
@@ -232,6 +232,11 @@ Runtime system
- Add a :rts-flag:`--no-automatic-time-samples` flag which stops time profiling samples being automatically started on
startup. Time profiling can be controlled manually using functions in ``GHC.Profiling``.
+- Add a :rts-flag:`-xr ⟨size⟩` which controls the size of virtual
+ memory address space reserved by the two step allocator on a 64-bit
+ platform. The default size is now 1T on aarch64 as well. See
+ :ghc-ticket:`24498`.
+
``base`` library
~~~~~~~~~~~~~~~~
=====================================
docs/users_guide/runtime_control.rst
=====================================
@@ -368,6 +368,18 @@ Miscellaneous RTS options
thread can execute its exception handlers. The ``-xq`` controls the
size of this additional quota.
+.. rts-flag:: -xr ⟨size⟩
+
+ :default: 1T
+
+ This option controls the size of virtual memory address space
+ reserved by the two step allocator on a 64-bit platform. It can be
+ useful in scenarios where even reserving a large address range
+ without committing can be expensive (e.g. WSL1), or when you
+ actually have enough physical memory and want to support a Haskell
+ heap larger than 1T. ``-xr`` is a no-op if GHC is configured with
+ ``--disable-large-address-space`` or if the platform is 32-bit.
+
.. _rts-options-gc:
RTS options to control the garbage collector
=====================================
libraries/base/changelog.md
=====================================
@@ -50,6 +50,10 @@
* Treat all FDs as "nonblocking" on wasm32 ([CLC proposal #234](https://github.com/haskell/core-libraries-committee/issues/234))
+ * Add `HeapByEra`, `eraSelector` and `automaticEraIncrement` to `GHC.RTS.Flags` to
+ reflect the new RTS flags: `-he` profiling mode, `-he` selector and `--automatic-era-increment`.
+ ([CLC proposal #254](https://github.com/haskell/core-libraries-committee/issues/254))
+
## 4.19.0.0 *October 2023*
* Add `{-# WARNING in "x-partial" #-}` to `Data.List.{head,tail}`.
Use `{-# OPTIONS_GHC -Wno-x-partial #-}` to disable it.
=====================================
libraries/ghc-experimental/src/Data/Sum/Experimental.hs
=====================================
@@ -80,4 +80,5 @@ module Data.Sum.Experimental (
Sum63#,
) where
+import GHC.Num.BigNat () -- for build ordering
import GHC.Types
=====================================
libraries/ghc-experimental/src/Data/Tuple/Experimental.hs
=====================================
@@ -161,3 +161,4 @@ module Data.Tuple.Experimental (
import GHC.Tuple
import GHC.Types
import GHC.Classes
+import GHC.Num.BigNat () -- for build ordering
=====================================
libraries/ghc-internal/src/GHC/Internal/RTS/Flags.hsc
=====================================
@@ -274,6 +274,7 @@ data DoHeapProfile
| HeapByLDV
| HeapByClosureType
| HeapByInfoTable
+ | HeapByEra -- ^ @since base-4.20.0.0
deriving ( Show -- ^ @since base-4.8.0.0
, Generic -- ^ @since base-4.15.0.0
)
@@ -289,6 +290,7 @@ instance Enum DoHeapProfile where
fromEnum HeapByLDV = #{const HEAP_BY_LDV}
fromEnum HeapByClosureType = #{const HEAP_BY_CLOSURE_TYPE}
fromEnum HeapByInfoTable = #{const HEAP_BY_INFO_TABLE}
+ fromEnum HeapByEra = #{const HEAP_BY_ERA}
toEnum #{const NO_HEAP_PROFILING} = NoHeapProfiling
toEnum #{const HEAP_BY_CCS} = HeapByCCS
@@ -299,6 +301,7 @@ instance Enum DoHeapProfile where
toEnum #{const HEAP_BY_LDV} = HeapByLDV
toEnum #{const HEAP_BY_CLOSURE_TYPE} = HeapByClosureType
toEnum #{const HEAP_BY_INFO_TABLE} = HeapByInfoTable
+ toEnum #{const HEAP_BY_ERA} = HeapByEra
toEnum e = errorWithoutStackTrace ("invalid enum for DoHeapProfile: " ++ show e)
-- | Parameters of the cost-center profiler
@@ -311,6 +314,7 @@ data ProfFlags = ProfFlags
, startHeapProfileAtStartup :: Bool
, startTimeProfileAtStartup :: Bool -- ^ @since base-4.20.0.0
, showCCSOnException :: Bool
+ , automaticEraIncrement :: Bool -- ^ @since 4.20.0.0
, maxRetainerSetSize :: Word
, ccsLength :: Word
, modSelector :: Maybe String
@@ -320,6 +324,7 @@ data ProfFlags = ProfFlags
, ccsSelector :: Maybe String
, retainerSelector :: Maybe String
, bioSelector :: Maybe String
+ , eraSelector :: Word -- ^ @since base-4.20.0.0
} deriving ( Show -- ^ @since base-4.8.0.0
, Generic -- ^ @since base-4.15.0.0
)
@@ -633,6 +638,8 @@ getProfFlags = do
(#{peek PROFILING_FLAGS, startTimeProfileAtStartup} ptr :: IO CBool))
<*> (toBool <$>
(#{peek PROFILING_FLAGS, showCCSOnException} ptr :: IO CBool))
+ <*> (toBool <$>
+ (#{peek PROFILING_FLAGS, incrementUserEra} ptr :: IO CBool))
<*> #{peek PROFILING_FLAGS, maxRetainerSetSize} ptr
<*> #{peek PROFILING_FLAGS, ccsLength} ptr
<*> (peekCStringOpt =<< #{peek PROFILING_FLAGS, modSelector} ptr)
@@ -642,6 +649,7 @@ getProfFlags = do
<*> (peekCStringOpt =<< #{peek PROFILING_FLAGS, ccsSelector} ptr)
<*> (peekCStringOpt =<< #{peek PROFILING_FLAGS, retainerSelector} ptr)
<*> (peekCStringOpt =<< #{peek PROFILING_FLAGS, bioSelector} ptr)
+ <*> #{peek PROFILING_FLAGS, eraSelector} ptr
getTraceFlags :: IO TraceFlags
getTraceFlags = do
=====================================
rts/RtsFlags.c
=====================================
@@ -186,6 +186,9 @@ void initRtsFlagsDefaults(void)
RtsFlags.GcFlags.ringBell = false;
RtsFlags.GcFlags.longGCSync = 0; /* detection turned off */
+ // 1 TBytes
+ RtsFlags.GcFlags.addressSpaceSize = (StgWord64)1 << 40;
+
RtsFlags.DebugFlags.scheduler = false;
RtsFlags.DebugFlags.interpreter = false;
RtsFlags.DebugFlags.weak = false;
@@ -552,6 +555,11 @@ usage_text[] = {
" -xq The allocation limit given to a thread after it receives",
" an AllocationLimitExceeded exception. (default: 100k)",
"",
+#if defined(USE_LARGE_ADDRESS_SPACE)
+" -xr The size of virtual memory address space reserved by the",
+" two step allocator (default: 1T)",
+"",
+#endif
" -Mgrace=<n>",
" The amount of allocation after the program receives a",
" HeapOverflow exception before the exception is thrown again, if",
@@ -1820,6 +1828,12 @@ error = true;
/ BLOCK_SIZE;
break;
+ case 'r':
+ OPTION_UNSAFE;
+ RtsFlags.GcFlags.addressSpaceSize
+ = decodeSize(rts_argv[arg], 3, MBLOCK_SIZE, HS_WORD64_MAX);
+ break;
+
default:
OPTION_SAFE;
errorBelch("unknown RTS option: %s",rts_argv[arg]);
@@ -2118,7 +2132,9 @@ decodeSize(const char *flag, uint32_t offset, StgWord64 min, StgWord64 max)
m = atof(s);
c = s[strlen(s)-1];
- if (c == 'g' || c == 'G')
+ if (c == 't' || c == 'T')
+ m *= (StgWord64)1024*1024*1024*1024;
+ else if (c == 'g' || c == 'G')
m *= 1024*1024*1024;
else if (c == 'm' || c == 'M')
m *= 1024*1024;
@@ -2737,4 +2753,3 @@ doingErasProfiling( void )
|| RtsFlags.ProfFlags.eraSelector != 0);
}
#endif /* PROFILING */
-
=====================================
rts/include/rts/Flags.h
=====================================
@@ -89,6 +89,8 @@ typedef struct _GC_FLAGS {
bool numa; /* Use NUMA */
StgWord numaMask;
+
+ StgWord64 addressSpaceSize; /* large address space size in bytes */
} GC_FLAGS;
/* See Note [Synchronization of flags and base APIs] */
=====================================
rts/js/arith.js
=====================================
@@ -44,19 +44,23 @@ function h$hs_remWord64(h1,l1,h2,l2) {
}
function h$hs_timesWord64(h1,l1,h2,l2) {
- var a = W64(h1,l1);
- var b = W64(h2,l2);
- var r = BigInt.asUintN(64, a * b);
- TRACE_ARITH("Word64: " + a + " * " + b + " ==> " + r)
- RETURN_W64(r);
+ var rh = h$mul2Word32(l1,l2);
+ var rl = h$ret1;
+
+ rh += Math.imul(l1,h2)>>>0;
+ rh += Math.imul(l2,h1)>>>0;
+ rh >>>= 0;
+
+ TRACE_ARITH("Word64: " + (h1,l1) + " * " + (h2,l2) + " ==> " + (rh,rl))
+ RETURN_UBX_TUP2(rh,rl);
}
function h$hs_minusWord64(h1,l1,h2,l2) {
- var a = (BigInt(h1) << BigInt(32)) | BigInt(l1>>>0);
- var b = (BigInt(h2) << BigInt(32)) | BigInt(l2>>>0);
- var r = BigInt.asUintN(64, a - b);
- TRACE_ARITH("Word64: " + a + " - " + b + " ==> " + r)
- RETURN_W64(r);
+ var l = l1-l2;
+ var rl = l>>>0;
+ var rh = (h1-h2-(l!=rl?1:0))>>>0;
+ TRACE_ARITH("Word64: " + (h1,l1) + " - " + (h2,l2) + " ==> " + (rh,rl))
+ RETURN_UBX_TUP2(rh,rl);
}
function h$hs_plusWord64(h1,l1,h2,l2) {
@@ -68,11 +72,15 @@ function h$hs_plusWord64(h1,l1,h2,l2) {
}
function h$hs_timesInt64(h1,l1,h2,l2) {
- var a = I64(h1,l1);
- var b = I64(h2,l2);
- var r = BigInt.asIntN(64, a * b);
- TRACE_ARITH("Int64: " + a + " * " + b + " ==> " + r)
- RETURN_I64(r);
+ var rh = h$mul2Word32(l1,l2);
+ var rl = h$ret1;
+
+ rh += Math.imul(l1,h2)|0;
+ rh += Math.imul(l2,h1)|0;
+ rh |= 0;
+
+ TRACE_ARITH("Int64: " + (h1,l1) + " * " + (h2,l2) + " ==> " + (rh,rl))
+ RETURN_UBX_TUP2(rh,rl);
}
function h$hs_quotInt64(h1,l1,h2,l2) {
@@ -92,19 +100,19 @@ function h$hs_remInt64(h1,l1,h2,l2) {
}
function h$hs_plusInt64(h1,l1,h2,l2) {
- var a = I64(h1,l1);
- var b = I64(h2,l2);
- var r = BigInt.asIntN(64, a + b);
- TRACE_ARITH("Int64: " + a + " + " + b + " ==> " + r)
- RETURN_I64(r);
+ var l = l1+l2;
+ var rl = l>>>0;
+ var rh = (h1+h2+(l!=rl?1:0))|0;
+ TRACE_ARITH("Int64: " + (h1,l1) + " + " + (h2,l2) + " ==> " + (rh,rl))
+ RETURN_UBX_TUP2(rh,rl);
}
function h$hs_minusInt64(h1,l1,h2,l2) {
- var a = I64(h1,l1);
- var b = I64(h2,l2);
- var r = BigInt.asIntN(64, a - b);
- TRACE_ARITH("Int64: " + a + " - " + b + " ==> " + r)
- RETURN_I64(r);
+ var l = l1-l2;
+ var rl = l>>>0;
+ var rh = (h1-h2-(l!=rl?1:0))|0;
+ TRACE_ARITH("Int64: " + (h1,l1) + " - " + (h2,l2) + " ==> " + (rh,rl))
+ RETURN_UBX_TUP2(rh,rl);
}
function h$hs_uncheckedShiftLWord64(h,l,n) {
=====================================
rts/sm/MBlock.c
=====================================
@@ -659,20 +659,14 @@ initMBlocks(void)
#if defined(USE_LARGE_ADDRESS_SPACE)
{
- W_ size;
-#if defined(aarch64_HOST_ARCH)
- size = (W_)1 << 38; // 1/4 TByte
-#else
- size = (W_)1 << 40; // 1 TByte
-#endif
void *startAddress = NULL;
if (RtsFlags.GcFlags.heapBase) {
startAddress = (void*) RtsFlags.GcFlags.heapBase;
}
- void *addr = osReserveHeapMemory(startAddress, &size);
+ void *addr = osReserveHeapMemory(startAddress, &RtsFlags.GcFlags.addressSpaceSize);
mblock_address_space.begin = (W_)addr;
- mblock_address_space.end = (W_)addr + size;
+ mblock_address_space.end = (W_)addr + RtsFlags.GcFlags.addressSpaceSize;
mblock_high_watermark = (W_)addr;
}
#elif SIZEOF_VOID_P == 8
=====================================
testsuite/tests/interface-stability/base-exports.stdout
=====================================
@@ -9029,7 +9029,7 @@ module GHC.RTS.Flags where
type DoCostCentres :: *
data DoCostCentres = CostCentresNone | CostCentresSummary | CostCentresVerbose | CostCentresAll | CostCentresJSON
type DoHeapProfile :: *
- data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable
+ data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable | HeapByEra
type DoTrace :: *
data DoTrace = TraceNone | TraceEventLog | TraceStderr
type GCFlags :: *
@@ -9080,6 +9080,7 @@ module GHC.RTS.Flags where
startHeapProfileAtStartup :: GHC.Types.Bool,
startTimeProfileAtStartup :: GHC.Types.Bool,
showCCSOnException :: GHC.Types.Bool,
+ automaticEraIncrement :: GHC.Types.Bool,
maxRetainerSetSize :: GHC.Types.Word,
ccsLength :: GHC.Types.Word,
modSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
@@ -9088,7 +9089,8 @@ module GHC.RTS.Flags where
ccSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
ccsSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
retainerSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
- bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String}
+ bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
+ eraSelector :: GHC.Types.Word}
type RTSFlags :: *
data RTSFlags = RTSFlags {gcFlags :: GCFlags, concurrentFlags :: ConcFlags, miscFlags :: MiscFlags, debugFlags :: DebugFlags, costCentreFlags :: CCFlags, profilingFlags :: ProfFlags, traceFlags :: TraceFlags, tickyFlags :: TickyFlags, parFlags :: ParFlags, hpcFlags :: HpcFlags}
type RtsTime :: *
=====================================
testsuite/tests/interface-stability/base-exports.stdout-javascript-unknown-ghcjs
=====================================
@@ -12071,7 +12071,7 @@ module GHC.RTS.Flags where
type DoCostCentres :: *
data DoCostCentres = CostCentresNone | CostCentresSummary | CostCentresVerbose | CostCentresAll | CostCentresJSON
type DoHeapProfile :: *
- data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable
+ data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable | HeapByEra
type DoTrace :: *
data DoTrace = TraceNone | TraceEventLog | TraceStderr
type GCFlags :: *
@@ -12122,6 +12122,7 @@ module GHC.RTS.Flags where
startHeapProfileAtStartup :: GHC.Types.Bool,
startTimeProfileAtStartup :: GHC.Types.Bool,
showCCSOnException :: GHC.Types.Bool,
+ automaticEraIncrement :: GHC.Types.Bool,
maxRetainerSetSize :: GHC.Types.Word,
ccsLength :: GHC.Types.Word,
modSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
@@ -12130,7 +12131,8 @@ module GHC.RTS.Flags where
ccSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
ccsSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
retainerSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
- bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String}
+ bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
+ eraSelector :: GHC.Types.Word}
type RTSFlags :: *
data RTSFlags = RTSFlags {gcFlags :: GCFlags, concurrentFlags :: ConcFlags, miscFlags :: MiscFlags, debugFlags :: DebugFlags, costCentreFlags :: CCFlags, profilingFlags :: ProfFlags, traceFlags :: TraceFlags, tickyFlags :: TickyFlags, parFlags :: ParFlags, hpcFlags :: HpcFlags}
type RtsTime :: *
=====================================
testsuite/tests/interface-stability/base-exports.stdout-mingw32
=====================================
@@ -9253,7 +9253,7 @@ module GHC.RTS.Flags where
type DoCostCentres :: *
data DoCostCentres = CostCentresNone | CostCentresSummary | CostCentresVerbose | CostCentresAll | CostCentresJSON
type DoHeapProfile :: *
- data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable
+ data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable | HeapByEra
type DoTrace :: *
data DoTrace = TraceNone | TraceEventLog | TraceStderr
type GCFlags :: *
@@ -9304,6 +9304,7 @@ module GHC.RTS.Flags where
startHeapProfileAtStartup :: GHC.Types.Bool,
startTimeProfileAtStartup :: GHC.Types.Bool,
showCCSOnException :: GHC.Types.Bool,
+ automaticEraIncrement :: GHC.Types.Bool,
maxRetainerSetSize :: GHC.Types.Word,
ccsLength :: GHC.Types.Word,
modSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
@@ -9312,7 +9313,8 @@ module GHC.RTS.Flags where
ccSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
ccsSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
retainerSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
- bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String}
+ bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
+ eraSelector :: GHC.Types.Word}
type RTSFlags :: *
data RTSFlags = RTSFlags {gcFlags :: GCFlags, concurrentFlags :: ConcFlags, miscFlags :: MiscFlags, debugFlags :: DebugFlags, costCentreFlags :: CCFlags, profilingFlags :: ProfFlags, traceFlags :: TraceFlags, tickyFlags :: TickyFlags, parFlags :: ParFlags, hpcFlags :: HpcFlags}
type RtsTime :: *
=====================================
testsuite/tests/interface-stability/base-exports.stdout-ws-32
=====================================
@@ -9029,7 +9029,7 @@ module GHC.RTS.Flags where
type DoCostCentres :: *
data DoCostCentres = CostCentresNone | CostCentresSummary | CostCentresVerbose | CostCentresAll | CostCentresJSON
type DoHeapProfile :: *
- data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable
+ data DoHeapProfile = NoHeapProfiling | HeapByCCS | HeapByMod | HeapByDescr | HeapByType | HeapByRetainer | HeapByLDV | HeapByClosureType | HeapByInfoTable | HeapByEra
type DoTrace :: *
data DoTrace = TraceNone | TraceEventLog | TraceStderr
type GCFlags :: *
@@ -9080,6 +9080,7 @@ module GHC.RTS.Flags where
startHeapProfileAtStartup :: GHC.Types.Bool,
startTimeProfileAtStartup :: GHC.Types.Bool,
showCCSOnException :: GHC.Types.Bool,
+ automaticEraIncrement :: GHC.Types.Bool,
maxRetainerSetSize :: GHC.Types.Word,
ccsLength :: GHC.Types.Word,
modSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
@@ -9088,7 +9089,8 @@ module GHC.RTS.Flags where
ccSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
ccsSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
retainerSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
- bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String}
+ bioSelector :: GHC.Internal.Maybe.Maybe GHC.Internal.Base.String,
+ eraSelector :: GHC.Types.Word}
type RTSFlags :: *
data RTSFlags = RTSFlags {gcFlags :: GCFlags, concurrentFlags :: ConcFlags, miscFlags :: MiscFlags, debugFlags :: DebugFlags, costCentreFlags :: CCFlags, profilingFlags :: ProfFlags, traceFlags :: TraceFlags, tickyFlags :: TickyFlags, parFlags :: ParFlags, hpcFlags :: HpcFlags}
type RtsTime :: *
=====================================
testsuite/tests/perf/compiler/T24471.hs
=====================================
@@ -0,0 +1,8 @@
+{-# LANGUAGE TemplateHaskell #-}
+module T24471 where
+
+import T24471a
+
+{-# OPAQUE foo #-}
+foo :: (List_ Int a -> a) -> a
+foo alg = $$(between [|| alg ||] 0 1000)
=====================================
testsuite/tests/perf/compiler/T24471a.hs
=====================================
@@ -0,0 +1,8 @@
+{-# LANGUAGE TemplateHaskell #-}
+module T24471a where
+
+data List_ a f = Nil_ | Cons_ a f deriving Functor
+
+between alg a b
+ | a == b = [|| $$alg Nil_ ||]
+ | otherwise = [|| $$alg (Cons_ a $$(between alg (a + 1) b)) ||]
=====================================
testsuite/tests/perf/compiler/all.T
=====================================
@@ -712,3 +712,7 @@ test ('LookupFusion',
[collect_stats('bytes allocated',2), when(wordsize(32), skip)],
compile_and_run,
['-O2 -package base'])
+
+test('T24471',
+ [ req_th, collect_compiler_stats('all', 5) ],
+ multimod_compile, ['T24471', '-v0 -O'])
=====================================
testsuite/tests/rts/all.T
=====================================
@@ -3,7 +3,7 @@ test('testblockalloc',
compile_and_run, [''])
test('testmblockalloc',
- [c_src, only_ways(['normal','threaded1']), extra_run_opts('+RTS -I0'),
+ [c_src, only_ways(['normal','threaded1']), extra_run_opts('+RTS -I0 -xr0.125T'),
when(arch('wasm32'), skip)], # MBlocks can't be freed on wasm32, see Note [Megablock allocator on wasm] in rts
compile_and_run, [''])
# -I0 is important: the idle GC will run the memory leak detector,
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/8d1384b1156be380b3d64eefbba4de756cd3f366...d8161acc13d72330d54c9176bb22ff6336ee9ee1
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/8d1384b1156be380b3d64eefbba4de756cd3f366...d8161acc13d72330d54c9176bb22ff6336ee9ee1
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240306/4704458e/attachment-0001.html>
More information about the ghc-commits
mailing list