[Git][ghc/ghc][wip/T25560] 7 commits: rts(setNumCapabilities): Assert that n_caps < MAX_N_CAPS

Ben Gamari (@bgamari) gitlab at gitlab.haskell.org
Wed Dec 11 16:15:07 UTC 2024



Ben Gamari pushed to branch wip/T25560 at Glasgow Haskell Compiler / GHC


Commits:
26764a68 by Ben Gamari at 2024-12-10T14:34:08-05:00
rts(setNumCapabilities): Assert that n_caps < MAX_N_CAPS

It was noticed in #25560 that this would previously be allowed,
resulting in a segfault.

I will add a proper exception in `base` in a future commit.

- - - - -
be52a457 by Ben Gamari at 2024-12-10T14:34:08-05:00
ghc-internal: Fix inconsistent FFI import types

The foreign imports of `enabled_capabilities` and
`getNumberOfProcessors` were declared as `CInt` whereas they are defined
as `uint32_t`.

- - - - -
207b1533 by Ben Gamari at 2024-12-10T14:34:08-05:00
rts: Mention maximum capability count in users guide

Addresses #25560.

- - - - -
b75f972d by Ben Gamari at 2024-12-10T14:34:08-05:00
rts/Capability: Move induction variable declaration into `for`s

Just a stylistic change.

- - - - -
3ff826f6 by Ben Gamari at 2024-12-10T14:34:09-05:00
rts: Determine max_n_capabilities at RTS startup

Previously the maximum number of capabilities supported by the RTS was
statically capped at 256. However, this bound is uncomfortably low given
the size of today's machine.

While supporting unbounded, fully-dynamic adjustment would be nice, it
is complex and so instead we do something simpler: Probe the logical
core count at RTS startup and use this as the static bound for the rest
of our execution.

This should avoid users running into the capability limit on large
machines while avoiding wasting memory on a large capabilities array for
most users and keeping complexity at bay.

Addresses #25560.

- - - - -
2d97ed0b by Ben Gamari at 2024-12-11T11:14:55-05:00
testsuite: Introduce req_c_rts

As suggested by @hsyl20, this is intended to mark tests that rely on the
behavior of the C RTS.

- - - - -
001ae913 by Ben Gamari at 2024-12-11T11:14:57-05:00
testsuite: Add test for #25560

- - - - -


11 changed files:

- docs/users_guide/using-concurrent.rst
- libraries/ghc-internal/src/GHC/Internal/Conc/Sync.hs
- rts/Capability.c
- rts/Capability.h
- rts/RtsSymbols.c
- rts/Schedule.c
- rts/include/rts/Config.h
- rts/include/rts/Threads.h
- testsuite/driver/testlib.py
- + testsuite/tests/rts/T25560.hs
- testsuite/tests/rts/all.T


Changes:

=====================================
docs/users_guide/using-concurrent.rst
=====================================
@@ -153,6 +153,14 @@ use the RTS :rts-flag:`-N ⟨x⟩` options.
     changed while the program is running by calling
     ``Control.Concurrent.setNumCapabilities``.
 
+
+.. note::
+
+    The maximum number of capabilities supported by the GHC runtime system is
+    determined when at RTS startup to be either 256, the value given by
+    :rts-flag:`-N ⟨x⟩`, or the number of logical CPU cores, whichever is
+    greater.
+
 The following options affect the way the runtime schedules threads on
 CPUs:
 


=====================================
libraries/ghc-internal/src/GHC/Internal/Conc/Sync.hs
=====================================
@@ -1,4 +1,3 @@
-{-# LANGUAGE CPP #-}
 {-# LANGUAGE MagicHash #-}
 {-# LANGUAGE NoImplicitPrelude #-}
 {-# LANGUAGE RankNTypes #-}
@@ -394,13 +393,14 @@ getNumProcessors :: IO Int
 getNumProcessors = fmap fromIntegral c_getNumberOfProcessors
 
 foreign import ccall unsafe "getNumberOfProcessors"
-  c_getNumberOfProcessors :: IO CUInt
+  c_getNumberOfProcessors :: IO Word32
 
 -- | Returns the number of sparks currently in the local spark pool
 numSparks :: IO Int
 numSparks = IO $ \s -> case numSparks# s of (# s', n #) -> (# s', I# n #)
 
-foreign import ccall "&enabled_capabilities" enabled_capabilities :: Ptr CInt
+foreign import ccall "&enabled_capabilities"
+  enabled_capabilities :: Ptr Word32
 
 childHandler :: SomeException -> IO ()
 childHandler err = catch (real_handler err) childHandler


=====================================
rts/Capability.c
=====================================
@@ -16,6 +16,7 @@
  *
  * --------------------------------------------------------------------------*/
 
+#include "rts/Config.h"
 #include "rts/PosixSource.h"
 #include "Rts.h"
 
@@ -40,12 +41,16 @@ Capability MainCapability;
 uint32_t n_capabilities = 0;
 uint32_t enabled_capabilities = 0;
 
+// The size of the `capabilities` array initialized at RTS startup. Referenced
+// by GHC.Internal.Conc.Sync
+uint32_t max_n_capabilities = MAX_N_CAPABILITIES;
+
 // The array of Capabilities.  It's important that when we need
 // to allocate more Capabilities we don't have to move the existing
 // Capabilities, because there may be pointers to them in use
 // (e.g. threads in waitForCapability(), see #8209), so this is
 // an array of Capability* rather than an array of Capability.
-Capability *capabilities[MAX_N_CAPABILITIES];
+Capability **capabilities;
 
 // Holds the Capability which last became free.  This is used so that
 // an in-call has a chance of quickly finding a free Capability.
@@ -344,8 +349,6 @@ initCapability (Capability *cap, uint32_t i)
  * ------------------------------------------------------------------------- */
 void initCapabilities (void)
 {
-    uint32_t i;
-
     /* Declare a couple capability sets representing the process and
        clock domain. Each capability will get added to these capsets. */
     traceCapsetCreate(CAPSET_OSPROCESS_DEFAULT, CapsetTypeOsProcess);
@@ -354,7 +357,7 @@ void initCapabilities (void)
     // Initialise NUMA
     if (!RtsFlags.GcFlags.numa) {
         n_numa_nodes = 1;
-        for (i = 0; i < MAX_NUMA_NODES; i++) {
+        for (uint32_t i = 0; i < MAX_NUMA_NODES; i++) {
             numa_map[i] = 0;
         }
     } else if (RtsFlags.DebugFlags.numa) {
@@ -388,12 +391,30 @@ void initCapabilities (void)
     }
 #endif
 
-    if (RtsFlags.ParFlags.nCapabilities > MAX_N_CAPABILITIES) {
-        errorBelch("warning: this GHC runtime system only supports up to %d capabilities",
-                   MAX_N_CAPABILITIES);
-        RtsFlags.ParFlags.nCapabilities = MAX_N_CAPABILITIES;
+    /*
+     * Note [Capabilities array sizing]
+     * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+     * Determine the size of the capabilities array as the maximum of:
+     *   * the static lower bound, `MAX_N_CAPABILITIES`
+     *   * the logical core count
+     *   * the users's choice of `+RTS -N`
+     * This will serve as the upper bound on the capability count for the rest
+     * of execution. Calls to `setNumCapabilities` exceeding this bound will
+     * issue a warning and otherwise have no effect.
+     *
+     * See #25560.
+     */
+    uint32_t core_count = getNumberOfProcessors();
+    if (core_count > max_n_capabilities) {
+        max_n_capabilities = core_count;
     }
 
+    if (RtsFlags.ParFlags.nCapabilities > max_n_capabilities) {
+        max_n_capabilities = RtsFlags.ParFlags.nCapabilities;
+    }
+
+    capabilities = stgMallocBytes(sizeof(Capability) * max_n_capabilities, "initCapabilities");
+
     n_capabilities = 0;
     moreCapabilities(0, RtsFlags.ParFlags.nCapabilities);
     n_capabilities = RtsFlags.ParFlags.nCapabilities;
@@ -401,6 +422,7 @@ void initCapabilities (void)
 #else /* !THREADED_RTS */
 
     n_capabilities = 1;
+    capabilities = stgMallocBytes(sizeof(Capability), "initCapabilities");
     capabilities[0] = &MainCapability;
 
     initCapability(&MainCapability, 0);
@@ -412,7 +434,7 @@ void initCapabilities (void)
     // There are no free capabilities to begin with.  We will start
     // a worker Task to each Capability, which will quickly put the
     // Capability on the free list when it finds nothing to do.
-    for (i = 0; i < n_numa_nodes; i++) {
+    for (uint32_t i = 0; i < n_numa_nodes; i++) {
         last_free_capability[i] = getCapability(0);
     }
 }


=====================================
rts/Capability.h
=====================================
@@ -270,11 +270,13 @@ INLINE_HEADER void releaseCapability_ (Capability* cap STG_UNUSED,
 // extern Capability MainCapability;
 
 // declared in rts/include/rts/Threads.h:
+// extern uint32_t max_n_capabilities;
 // extern uint32_t n_capabilities;
 // extern uint32_t enabled_capabilities;
 
-// Array of all the capabilities
-extern Capability *capabilities[MAX_N_CAPABILITIES];
+// Array of all the capabilities, of size max_n_capabilities
+// See Note [Capabilities array sizing] in rts/Capability.c.
+extern Capability **capabilities;
 
 INLINE_HEADER Capability *getCapability(uint32_t i)
 {


=====================================
rts/RtsSymbols.c
=====================================
@@ -909,6 +909,7 @@ extern char **environ;
       SymI_NeedsDataProto(rts_stop_on_exception)                        \
       SymI_HasProto(stopTimer)                                          \
       SymI_HasProto(n_capabilities)                                     \
+      SymI_HasProto(max_n_capabilities)                                 \
       SymI_HasProto(enabled_capabilities)                               \
       SymI_HasDataProto(stg_traceEventzh)                                   \
       SymI_HasDataProto(stg_traceMarkerzh)                                  \


=====================================
rts/Schedule.c
=====================================
@@ -2288,9 +2288,12 @@ setNumCapabilities (uint32_t new_n_capabilities USED_IF_THREADS)
     } else if (new_n_capabilities <= 0) {
         errorBelch("setNumCapabilities: Capability count must be positive");
         return;
+    } else if (new_n_capabilities > max_n_capabilities) {
+        // See Note [Capabilities array sizing] in rts/Capability.c.
+        errorBelch("setNumCapabilities: Attempt to increase capability count beyond maximum capability count %" PRIu32 "; clamping...\n", max_n_capabilities);
+        new_n_capabilities = max_n_capabilities;
     }
 
-
     debugTrace(DEBUG_sched, "changing the number of Capabilities from %d to %d",
                enabled_capabilities, new_n_capabilities);
 


=====================================
rts/include/rts/Config.h
=====================================
@@ -78,6 +78,10 @@ code.
 #endif
 
 #if defined(THREADED_RTS)
+/*
+ * See Note [Capabilities array sizing] in rts/Capability.c.
+ * Update the note in docs/users_guide/using-concurrent.rst when updating this.
+ */
 #define MAX_N_CAPABILITIES 256
 #else
 #define MAX_N_CAPABILITIES 1


=====================================
rts/include/rts/Threads.h
=====================================
@@ -69,7 +69,7 @@ HsBool rtsSupportsBoundThreads (void);
 // The number of Capabilities.
 // TODO: Ideally we would only provide getNumCapabilities
 // but this is used in compiler/cbits/genSym.c
-extern unsigned int n_capabilities;
+extern uint32_t n_capabilities;
 
 INLINE_HEADER unsigned int getNumCapabilities(void)
 { return RELAXED_LOAD(&n_capabilities); }
@@ -77,6 +77,10 @@ INLINE_HEADER unsigned int getNumCapabilities(void)
 // The number of Capabilities that are not disabled
 extern uint32_t enabled_capabilities;
 
+// The maximum number of Capabilities supported by the RTS.
+// See Note [Capabilities array sizing] in rts/Capability.c.
+extern uint32_t max_n_capabilities;
+
 #if !IN_STG_CODE
 extern Capability MainCapability;
 #endif


=====================================
testsuite/driver/testlib.py
=====================================
@@ -270,6 +270,10 @@ def req_bco( name, opts ):
     # JS backend doesn't support ByteCode
     js_skip(name, opts)
 
+def req_c_rts( name, opts ):
+    """ Require the C runtime system (rather than, e.g. the Javascript RTS). """
+    js_skip(name, opts)
+
 def req_rts_linker( name, opts ):
     if not config.have_RTS_linker:
         opts.expect = 'fail'


=====================================
testsuite/tests/rts/T25560.hs
=====================================
@@ -0,0 +1,4 @@
+import GHC.Conc
+
+main :: IO ()
+main = setNumCapabilities 100000


=====================================
testsuite/tests/rts/all.T
=====================================
@@ -630,3 +630,7 @@ test('T24142', [req_target_smp], compile_and_run, ['-threaded -with-rtsopts "-N2
 
 test('T25232', [unless(have_profiling(), skip), only_ways(['normal','nonmoving','nonmoving_prof','nonmoving_thr_prof']), extra_ways(['nonmoving', 'nonmoving_prof'] + (['nonmoving_thr_prof'] if have_threaded() else []))], compile_and_run, [''])
 test('T25280', [unless(opsys('linux'),skip),req_process,js_skip], compile_and_run, [''])
+
+# N.B. This will likely issue a warning on stderr but we merely care that the
+# program doesn't crash.
+test('T25560', [req_c_rts, ignore_stderr], compile_and_run, [''])



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/2de8eaedfa9c7212a26e64fe49e49e21c238b6b3...001ae913d4d9d27c0e394903523a94bd6a4ed43a

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/2de8eaedfa9c7212a26e64fe49e49e21c238b6b3...001ae913d4d9d27c0e394903523a94bd6a4ed43a
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20241211/049b99d1/attachment-0001.html>


More information about the ghc-commits mailing list