[Git][ghc/ghc][wip/marge_bot_batch_merge_job] 7 commits: rts: do not prefetch mark_closure bdescr in non-moving gc when ASSERTS_ENABLED

Fri May 17 09:40:44 UTC 2024


Marge Bot pushed to branch wip/marge_bot_batch_merge_job at Glasgow Haskell Compiler / GHC


Commits:
886ab43a by Cheng Shao at 2024-05-17T01:34:50-04:00
rts: do not prefetch mark_closure bdescr in non-moving gc when ASSERTS_ENABLED

This commit fixes a small an oversight in !12148: the prefetch logic
in non-moving GC may trap in debug RTS because it calls Bdescr() for
mark_closure which may be a static one. It's fine in non-debug RTS
because even invalid bdescr addresses are prefetched, they will not
cause segfaults, so this commit implements the most straightforward
fix: don't prefetch mark_closure bdescr when assertions are enabled.

- - - - -
b38dcf39 by Teo Camarasu at 2024-05-17T01:34:50-04:00
rts: Allocate non-moving segments with megablocks

Non-moving segments are 8 blocks long and need to be aligned.
Previously we serviced allocations by grabbing 15 blocks, finding
an aligned 8 block group in it and returning the rest.
This proved to lead to high levels of fragmentation as a de-allocating a segment
caused an 8 block gap to form, and this could not be reused for allocation.

This patch introduces a segment allocator based around using entire
megablocks to service segment allocations in bulk.

When there are no free segments, we grab an entire megablock and fill it
with aligned segments. As the megablock is free, we can easily guarantee
alignment. Any unused segments are placed on a free list.

It only makes sense to free segments in bulk when all of the segments in
a megablock are freeable. After sweeping, we grab the free list, sort it,
and find all groups of segments where they cover the megablock and free
them.
This introduces a period of time when free segments are not available to
the mutator, but the risk that this would lead to excessive allocation
is low. Right after sweep, we should have an abundance of partially full
segments, and this pruning step is relatively quick.

In implementing this we drop the logic that kept NONMOVING_MAX_FREE
segments on the free list.

We also introduce an eventlog event to log the amount of pruned/retained
free segments.

See Note [Segment allocation strategy]

Resolves #24150

-------------------------
Metric Decrease:
    T13253
    T19695
-------------------------

- - - - -
710665bd by Cheng Shao at 2024-05-17T01:35:30-04:00
rts: fix I/O manager compilation errors for win32 target

This patch fixes I/O manager compilation errors for win32 target
discovered when cross-compiling to win32 using recent clang:

```
rts/win32/ThrIOManager.c:117:7: error:
     error: call to undeclared function 'is_io_mng_native_p'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
      117 |   if (is_io_mng_native_p ()) {
          |       ^
    |
117 |   if (is_io_mng_native_p ()) {
    |       ^

1 error generated.
`x86_64-w64-mingw32-clang' failed in phase `C Compiler'. (Exit code: 1)

rts/fs.c:143:28: error:
     error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
      143 | int setErrNoFromWin32Error () {
          |                            ^
          |                             void
    |
143 | int setErrNoFromWin32Error () {
    |                            ^

1 error generated.
`x86_64-w64-mingw32-clang' failed in phase `C Compiler'. (Exit code: 1)

rts/win32/ConsoleHandler.c:227:9: error:
     error: call to undeclared function 'interruptIOManagerEvent'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
      227 |         interruptIOManagerEvent ();
          |         ^
    |
227 |         interruptIOManagerEvent ();
    |         ^

rts/win32/ConsoleHandler.c:227:9: error:
     note: did you mean 'getIOManagerEvent'?
    |
227 |         interruptIOManagerEvent ();
    |         ^

rts/include/rts/IOInterface.h:27:10: error:
     note: 'getIOManagerEvent' declared here
       27 | void *   getIOManagerEvent  (void);
          |          ^
   |
27 | void *   getIOManagerEvent  (void);
   |          ^

1 error generated.
`x86_64-w64-mingw32-clang' failed in phase `C Compiler'. (Exit code: 1)

rts/win32/ConsoleHandler.c:196:9: error:
     error: call to undeclared function 'setThreadLabel'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
      196 |         setThreadLabel(cap, t, "signal handler thread");
          |         ^
    |
196 |         setThreadLabel(cap, t, "signal handler thread");
    |         ^

rts/win32/ConsoleHandler.c:196:9: error:
     note: did you mean 'postThreadLabel'?
    |
196 |         setThreadLabel(cap, t, "signal handler thread");
    |         ^

rts/eventlog/EventLog.h:118:6: error:
     note: 'postThreadLabel' declared here
      118 | void postThreadLabel(Capability    *cap,
          |      ^
    |
118 | void postThreadLabel(Capability    *cap,
    |      ^

1 error generated.
`x86_64-w64-mingw32-clang' failed in phase `C Compiler'. (Exit code: 1)
```

- - - - -
28b9cee0 by Rodrigo Mesquita at 2024-05-17T01:36:05-04:00
configure: Check C99-compat for Cmm preprocessor

Fixes #24815

- - - - -
8927e0c3 by Andreas Klebinger at 2024-05-17T01:36:41-04:00
Ensure `tcHasFixedRuntimeRep (# #)` returns True.

- - - - -
3dacc707 by doyougnu at 2024-05-17T05:40:03-04:00
testsuite: make find_so regex less general

Closes #24759

Background. In MR !12372 we began tracking shared object files and
directories sizes for dependencies. However, this broke  release builds
because release builds alter the filenames swapping "in-place" for a
hash. This was not considered in the MR and thus broke release
pipelines. Furthermore, the rts_so test was found to be wildly varying
and was therefore disabled in !12561.

This commit fixes both of these issues:

- fix the rts_so test by making the regex less general, now the rts_so
test and all other foo.so tests must match
"libHS<some-lib>-<version>-<hash|'in-place>-<ghc>". This prevents the
rts_so test from accidentally matching different rts variants such as
rts_threaded, which was the cause of the wild swings after !12372.

- add logic to match either a hash or the string in-place. This should
make the find_so function build agnostic.

- - - - -
fdf826a5 by Andreas Klebinger at 2024-05-17T05:40:04-04:00
TagAnalysis: Treat all bottom ids as tagged during analysis.

Ticket #24806 showed that we also need to treat dead end thunks as
tagged during the analysis.

- - - - -


30 changed files:

- compiler/GHC/Core/TyCon.hs
- compiler/GHC/Stg/InferTags.hs
- compiler/GHC/Stg/InferTags/Rewrite.hs
- docs/users_guide/9.12.1-notes.rst
- docs/users_guide/eventlog-formats.rst
- m4/fp_cmm_cpp_cmd_with_args.m4
- rts/RtsStartup.c
- rts/Trace.c
- rts/Trace.h
- rts/eventlog/EventLog.c
- rts/eventlog/EventLog.h
- rts/gen_event_types.py
- rts/include/rts/storage/Block.h
- rts/include/stg/SMP.h
- rts/sm/BlockAlloc.c
- rts/sm/GC.c
- rts/sm/NonMoving.c
- rts/sm/NonMoving.h
- rts/sm/NonMovingAllocate.c
- rts/sm/NonMovingMark.c
- rts/sm/Sanity.c
- rts/win32/ConsoleHandler.c
- rts/win32/ThrIOManager.c
- testsuite/driver/testlib.py
- testsuite/tests/perf/size/all.T
- testsuite/tests/rts/atomicinc.c
- + testsuite/tests/simplStg/should_compile/T24806.hs
- + testsuite/tests/simplStg/should_compile/T24806.stderr
- testsuite/tests/simplStg/should_compile/all.T
- utils/fs/fs.c


Changes:

=====================================
compiler/GHC/Core/TyCon.hs
=====================================
@@ -2408,7 +2408,9 @@ tcHasFixedRuntimeRep tc@(TyCon { tyConDetails = details })
                -- the representation be fully-known, including levity variables.
                -- This might be relaxed in the future (#15532).
 
-       TupleTyCon { tup_sort = tuple_sort } -> isBoxed (tupleSortBoxity tuple_sort)
+       TupleTyCon { tup_sort = tuple_sort } -> isBoxed (tupleSortBoxity tuple_sort) ||
+                                               -- (# #) also has fixed rep.
+                                               tyConArity tc == 0
 
        SumTyCon {} -> False   -- only unboxed sums here
 


=====================================
compiler/GHC/Stg/InferTags.hs
=====================================
@@ -16,6 +16,7 @@ import GHC.Types.Id.Info (tagSigInfo)
 import GHC.Types.Name
 import GHC.Stg.Syntax
 import GHC.Types.Basic ( CbvMark (..) )
+import GHC.Types.Demand (isDeadEndAppSig)
 import GHC.Types.Unique.Supply (mkSplitUniqSupply)
 import GHC.Types.RepType (dataConRuntimeRepStrictness)
 import GHC.Core (AltCon(..))
@@ -301,12 +302,14 @@ inferTagExpr env (StgApp fun args)
     (info, StgApp fun args)
   where
     !fun_arity = idArity fun
-    info | fun_arity == 0 -- Unknown arity => Thunk or unknown call
-         = TagDunno
+    info
+         -- It's important that we check for bottoms before all else.
+         -- See Note [Bottom functions are TagTagged] and #24806 for why.
+         | isDeadEndAppSig (idDmdSig fun) (length args)
+         = TagTagged
 
-         | isDeadEndId fun
-         , fun_arity == length args -- Implies we will simply call the function.
-         = TagTagged -- See Note [Bottom functions are TagTagged]
+         | fun_arity == 0 -- Unknown arity => Thunk or unknown call
+         = TagDunno
 
          | Just (TagSig res_info) <- tagSigInfo (idInfo fun)
          , fun_arity == length args  -- Saturated
@@ -500,6 +503,11 @@ it safely any tag sig we like.
 So we give it TagTagged, as it allows the combined tag sig of the case expression
 to be the combination of all non-bottoming branches.
 
+NB: After the analysis is done we go back to treating bottoming functions as
+untagged to ensure they are evaluated as expected in code like:
+
+  case bottom_id of { ...}
+
 -}
 
 -----------------------------


=====================================
compiler/GHC/Stg/InferTags/Rewrite.hs
=====================================
@@ -241,7 +241,10 @@ indicates a bug in the tag inference implementation.
 For this reason we assert that we are running in interactive mode if a lookup fails.
 -}
 isTagged :: Id -> RM Bool
-isTagged v = do
+isTagged v
+    -- See Note [Bottom functions are TagTagged]
+    | isDeadEndId v = pure False
+    | otherwise = do
     this_mod <- getMod
     -- See Note [Tag inference for interactive contexts]
     let lookupDefault v = assertPpr (isInteractiveModule this_mod)


=====================================
docs/users_guide/9.12.1-notes.rst
=====================================
@@ -60,6 +60,8 @@ GHCi
 Runtime system
 ~~~~~~~~~~~~~~
 
+- Reduce fragmentation incurred by the nonmoving GC's segment allocator. In one application this reduced resident set size by 26%. See :ghc-ticket:`24150`.
+
 ``base`` library
 ~~~~~~~~~~~~~~~~
 


=====================================
docs/users_guide/eventlog-formats.rst
=====================================
@@ -855,6 +855,9 @@ A typical non-moving collection cycle will look something like the following:
 12. A :event-type:`NONMOVING_HEAP_CENSUS` event will be emitted describing the
     fragmentation state of the non-moving heap.
 
+13. A :event-type:`NONMOVING_PRUNED_SEGMENTS` event will be emitted showing
+    information about freeing of segments.
+
 
 .. event-type:: CONC_MARK_BEGIN
 
@@ -929,6 +932,17 @@ heap.
 
    Describes the occupancy of the *blk_sz* sub-heap.
 
+.. event-type:: NONMOVING_PRUNED_SEGMENTS
+
+   :tag: 208
+   :length: fixed
+   :field Word32: number of pruned segments.
+   :field Word32: number of segments remaining on the free list.
+
+   Report the amount of segments pruned and those remaining on the nonmoving
+   heap's segment free list. Segments will be retained on the free list until
+   the entire megablock containing them can be freed.
+
 .. _ticky-event-format:
 
 Ticky counters


=====================================
m4/fp_cmm_cpp_cmd_with_args.m4
=====================================
@@ -2,8 +2,10 @@
 # --------------------------
 # sets CMM_CPP command and its arguments
 #
-# $1 = the variable to set to Cmm CPP command
-# $2 = the variable to set to Cmm CPP command arguments
+# $1 = the path to the C compiler
+# $2 = the variable to set to Cmm CPP command
+# $3 = the variable to set to Cmm CPP command arguments
+# $4 = whether Cmm CPP command supports -g0
 
 AC_DEFUN([FP_CMM_CPP_CMD_WITH_ARGS],[
 
@@ -46,14 +48,33 @@ AC_ARG_WITH(cmm-cpp-flags,
 
 AC_MSG_CHECKING([whether the C-- preprocessor "$CMM_CPP_CMD" $CMM_CPP_ARGS supports -g0])
 : > conftest.c
-if "$CMM_CPP_CMD" $CMM_CPP_ARGS conftest.c -g0 >/dev/null 2>&1; then
+if "$CMM_CPP_CMD" $CMM_CPP_ARGS conftest.c -o conftest -g0 >/dev/null 2>&1; then
   $4=True
   AC_MSG_RESULT([yes])
 else
   $4=False
   AC_MSG_RESULT([no])
 fi
-rm -f conftest.c
+
+AC_MSG_CHECKING([the C-- preprocessor for C99 support])
+cat > conftest.c <<EOF
+#include <stdio.h>
+#if !defined __STDC_VERSION__ || __STDC_VERSION__ < 199901L
+# error "Compiler does not advertise C99 conformance"
+#endif
+EOF
+if "$CMM_CPP_CMD" $CMM_CPP_ARGS conftest.c -o conftest -g0 >/dev/null 2>&1; then
+  AC_MSG_RESULT([yes])
+else
+    # Try -std=gnu99
+    if "$CMM_CPP_CMD" -std=gnu99 $CMM_CPP_ARGS conftest.c -o conftest -g0 >/dev/null 2>&1; then
+      $3="-std=gnu99 $$3"
+      AC_MSG_RESULT([needs -std=gnu99])
+    else
+      AC_MSG_ERROR([C99-compatible compiler needed])
+    fi
+fi
+rm -f conftest.c conftest.o conftest
 
 
 $2="$CMM_CPP_CMD"


=====================================
rts/RtsStartup.c
=====================================
@@ -456,7 +456,7 @@ hs_exit_(bool wait_foreign)
     uint32_t g, i;
 
     // N.B. atomic_dec returns the new value.
-    StgInt init_count = (StgInt)atomic_dec(&hs_init_count);
+    StgInt init_count = (StgInt)atomic_dec(&hs_init_count, 1);
     if (init_count > 0) {
         // ignore until it's the last one
         return;


=====================================
rts/Trace.c
=====================================
@@ -929,6 +929,12 @@ void traceNonmovingHeapCensus(uint16_t blk_size,
         postNonmovingHeapCensus(blk_size, census);
 }
 
+void traceNonmovingPrunedSegments(uint32_t pruned_segments, uint32_t free_segments)
+{
+    if (eventlog_enabled && TRACE_nonmoving_gc)
+        postNonmovingPrunedSegments(pruned_segments, free_segments);
+}
+
 void traceThreadStatus_ (StgTSO *tso USED_IF_DEBUG)
 {
 #if defined(DEBUG)


=====================================
rts/Trace.h
=====================================
@@ -332,6 +332,7 @@ void traceConcSweepEnd(void);
 void traceConcUpdRemSetFlush(Capability *cap);
 void traceNonmovingHeapCensus(uint16_t blk_size,
                               const struct NonmovingAllocCensus *census);
+void traceNonmovingPrunedSegments(uint32_t pruned_segments, uint32_t free_segments);
 
 void traceIPE(const InfoProvEnt *ipe);
 void flushTrace(void);


=====================================
rts/eventlog/EventLog.c
=====================================
@@ -1154,6 +1154,15 @@ void postNonmovingHeapCensus(uint16_t blk_size,
     RELEASE_LOCK(&eventBufMutex);
 }
 
+void postNonmovingPrunedSegments(uint32_t pruned_segments, uint32_t free_segments)
+{
+    ACQUIRE_LOCK(&eventBufMutex);
+    postEventHeader(&eventBuf, EVENT_NONMOVING_PRUNED_SEGMENTS);
+    postWord32(&eventBuf, pruned_segments);
+    postWord32(&eventBuf, free_segments);
+    RELEASE_LOCK(&eventBufMutex);
+}
+
 void closeBlockMarker (EventsBuf *ebuf)
 {
     if (ebuf->marker)


=====================================
rts/eventlog/EventLog.h
=====================================
@@ -196,6 +196,7 @@ void postConcUpdRemSetFlush(Capability *cap);
 void postConcMarkEnd(StgWord32 marked_obj_count);
 void postNonmovingHeapCensus(uint16_t blk_size,
                              const struct NonmovingAllocCensus *census);
+void postNonmovingPrunedSegments(uint32_t pruned_segments, uint32_t free_segments);
 
 #if defined(TICKY_TICKY)
 void postTickyCounterDefs(StgEntCounter *p);


=====================================
rts/gen_event_types.py
=====================================
@@ -132,6 +132,7 @@ event_types = [
     EventType(205, 'CONC_SWEEP_END',               [],                    'End concurrent sweep phase'),
     EventType(206, 'CONC_UPD_REM_SET_FLUSH',       [CapNo],               'Update remembered set flushed'),
     EventType(207, 'NONMOVING_HEAP_CENSUS',        [Word16, Word32, Word32, Word32], 'Nonmoving heap census'),
+    EventType(208, 'NONMOVING_PRUNED_SEGMENTS',    [Word32, Word32],      'Report the amount of segments pruned and remaining on the free list.'),
 
     # Ticky-ticky profiling
     EventType(210, 'TICKY_COUNTER_DEF',            VariableLength,        'Ticky-ticky entry counter definition'),


=====================================
rts/include/rts/storage/Block.h
=====================================
@@ -323,6 +323,10 @@ bdescr *allocGroupOnNode(uint32_t node, W_ n);
 //
 bdescr *allocAlignedGroupOnNode(uint32_t node, W_ n);
 
+// Allocate a MBlock worth of `n` block sized chunks aligned at `n`-block boundry.
+// This returns a linked list of `bdescr` of length `BLOCKS_PER_MBLOCK / n`.
+bdescr *allocMBlockAlignedGroupOnNode(uint32_t node, W_ n);
+
 EXTERN_INLINE bdescr* allocBlockOnNode(uint32_t node);
 EXTERN_INLINE bdescr* allocBlockOnNode(uint32_t node)
 {


=====================================
rts/include/stg/SMP.h
=====================================
@@ -82,13 +82,13 @@ EXTERN_INLINE StgWord cas_seq_cst_relaxed(StgVolatilePtr p, StgWord o, StgWord n
 EXTERN_INLINE StgWord atomic_inc(StgVolatilePtr p, StgWord n);
 
 /*
- * Atomic decrement
+ * Atomic subtraction by the provided quantity.
  *
- * atomic_dec(p) {
- *   return --(*p);
+ * atomic_dec(p, n) {
+ *   return ((*p) -= n);
  * }
  */
-EXTERN_INLINE StgWord atomic_dec(StgVolatilePtr p);
+EXTERN_INLINE StgWord atomic_dec(StgVolatilePtr p, StgWord n);
 
 /*
  * Busy-wait nop: this is a hint to the CPU that we are currently in a
@@ -480,9 +480,9 @@ atomic_inc(StgVolatilePtr p, StgWord incr)
 }
 
 EXTERN_INLINE StgWord
-atomic_dec(StgVolatilePtr p)
+atomic_dec(StgVolatilePtr p, StgWord decr)
 {
-    return __atomic_sub_fetch(p, 1, __ATOMIC_SEQ_CST);
+    return __atomic_sub_fetch(p, decr, __ATOMIC_SEQ_CST);
 }
 
 /*
@@ -634,9 +634,9 @@ atomic_inc(StgVolatilePtr p, StgWord incr)
 }
 
 INLINE_HEADER StgWord
-atomic_dec(StgVolatilePtr p)
+atomic_dec(StgVolatilePtr p, StgWord decr)
 {
-    return --(*p);
+    return ((*p) -= decr);
 }
 #endif
 


=====================================
rts/sm/BlockAlloc.c
=====================================
@@ -394,6 +394,53 @@ split_block_low (bdescr *bd, W_ n)
 }
 
 
+// A variant of `split_block_high` where we keep both blocks.
+// The returned block has size `n`, which is split off `bd`.
+static bdescr *
+split_block_high_no_free (bdescr *bd, W_ n)
+{
+    ASSERT(bd->blocks > n);
+
+    bdescr* ret = bd + bd->blocks - n; // take n blocks off the end
+    ret->blocks = n;
+    ret->start = ret->free = bd->start + (bd->blocks - n)*BLOCK_SIZE_W;
+    ret->link = NULL;
+
+    bd->blocks -= n;
+
+    setup_tail(ret);
+    setup_tail(bd);
+
+    return ret;
+}
+
+// Allocate a MBlock worth of `n` block sized chunks aligned at `n`-block boundry.
+// This returns a linked list of `bdescr` of length `BLOCKS_PER_MBLOCK / n`.
+// We assume relevant locks are held.
+bdescr *
+allocMBlockAlignedGroupOnNode(uint32_t node, W_ n)
+{
+    bdescr *bd = allocGroupOnNode(node, BLOCKS_PER_MBLOCK);
+
+    // Free unaligned blocks, as we can't use these.
+    ASSERT(bd->blocks == BLOCKS_PER_MBLOCK);
+    bd = split_block_high(bd, bd->blocks - bd->blocks % n);
+    ASSERT(bd->blocks % n == 0);
+
+    bdescr *last = NULL;
+    bdescr *chunk = NULL;
+    // Chain the aligned groups together into a linked-list
+    while (bd->blocks > n) {
+      chunk = split_block_high_no_free(bd, n);
+      chunk->link = last;
+      last = chunk;
+    }
+    bd->link = chunk;
+
+    return bd;
+}
+
+
 /* Find a fitting block for the allocation request in the given free list.
    Returns:
      - not NULL: when an exact match was found in the free list.


=====================================
rts/sm/GC.c
=====================================
@@ -1313,7 +1313,7 @@ dec_running (void)
     ACQUIRE_LOCK(&gc_running_mutex);
 #endif
 
-    StgWord r = atomic_dec(&gc_running_threads);
+    StgWord r = atomic_dec(&gc_running_threads, 1);
 
 #if defined(THREADED_RTS)
     if (r == 0) {


=====================================
rts/sm/NonMoving.c
=====================================
@@ -248,6 +248,8 @@ static void nonmovingBumpEpoch(void) {
  *
  *  - Note [Allocator sizes] goes into detail about our choice of allocator sizes.
  *
+ *  - Note [Segment allocation strategy] explains our segment allocation strategy.
+ *
  * [ueno 2016]:
  *   Katsuhiro Ueno and Atsushi Ohori. 2016. A fully concurrent garbage
  *   collector for functional programs on multicore processors. SIGPLAN Not. 51,
@@ -560,6 +562,22 @@ static void nonmovingBumpEpoch(void) {
  *
  * See #23340
  *
+ * Note [Segment allocation strategy]
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * Non-moving segments must be aligned. In order, to efficiently service these
+ * allocations, we allocate segments in bulk
+ * We allocate an entire megablocks worth of segments at once.
+ * All unused segments are placed on the `nonmovingHeap.free` list.
+ *
+ * Symmetrically we only de-allocate segments if all the segments in a megablock are free-able, ie,
+ * are on `nonmovingHeap.free`. We prune the free list in `nonmovingPruneFreeSegmentList`,
+ * called during concurrent sweep phase.
+ * Note that during pruning of the free list, free segments are not available for use by the
+ * mutator. This might lead to extra allocation of segments. But the risk is low as just after sweep
+ * there is usually a large amount of partially full segments, and pruning the free list is quite
+ * quick.
+ *
+ * See #24150
  */
 
 memcount nonmoving_segment_live_words = 0;
@@ -578,19 +596,6 @@ static void nonmovingExitConcurrentWorker(void);
 // Add a segment to the free list.
 void nonmovingPushFreeSegment(struct NonmovingSegment *seg)
 {
-    // See Note [Live data accounting in nonmoving collector].
-    if (RELAXED_LOAD(&nonmovingHeap.n_free) > NONMOVING_MAX_FREE) {
-        bdescr *bd = Bdescr((StgPtr) seg);
-        ACQUIRE_SM_LOCK;
-        ASSERT(oldest_gen->n_blocks >= bd->blocks);
-        ASSERT(oldest_gen->n_words >= BLOCK_SIZE_W * bd->blocks);
-        oldest_gen->n_blocks -= bd->blocks;
-        oldest_gen->n_words  -= BLOCK_SIZE_W * bd->blocks;
-        freeGroup(bd);
-        RELEASE_SM_LOCK;
-        return;
-    }
-
     SET_SEGMENT_STATE(seg, FREE);
     while (true) {
         struct NonmovingSegment *old = nonmovingHeap.free;
@@ -601,6 +606,110 @@ void nonmovingPushFreeSegment(struct NonmovingSegment *seg)
     __sync_add_and_fetch(&nonmovingHeap.n_free, 1);
 }
 
+static int
+cmp_segment_ptr (const void *x, const void *y)
+{
+    const struct NonMovingSegment *p1 = *(const struct NonMovingSegment**)x;
+    const struct NonMovingSegment *p2 = *(const struct NonMovingSegment**)y;
+    if (p1 > p2) return +1;
+    else if (p1 < p2) return -1;
+    else return 0;
+}
+
+// Prune the free list of segments that can be freed.
+// Segments can be freed if all segments from a mblock are on the free list.
+void nonmovingPruneFreeSegmentList(void)
+{
+  trace(TRACE_nonmoving_gc, "Pruning free segment list.");
+  // Atomically grab the entire free list.
+  struct NonmovingSegment *free;
+  size_t length;
+  while (true) {
+    free = ACQUIRE_LOAD(&nonmovingHeap.free);
+    length = ACQUIRE_LOAD(&nonmovingHeap.n_free);
+    if (cas((StgVolatilePtr) &nonmovingHeap.free,
+            (StgWord) free,
+            (StgWord) NULL) == (StgWord) free) {
+        atomic_dec((StgVolatilePtr) &nonmovingHeap.n_free, length);
+        break;
+    }
+    // Save the current free list so the sanity checker can see these segments.
+    nonmovingHeap.saved_free = free;
+  }
+
+  // Sort the free list by address.
+  struct NonmovingSegment **sorted = stgMallocBytes(sizeof(struct NonmovingSegment*) * length, "sorted free segment list");
+  for(size_t i = 0; i<length; i++) {
+    sorted[i] = free;
+    free = free->link;
+  }
+  // we should have reached the end of the free list
+  ASSERT(free == NULL);
+
+  qsort(sorted, length, sizeof(struct NonmovingSegment*), cmp_segment_ptr);
+
+  // Walk the sorted list and either:
+  // - free segments if the entire megablock is free
+  // - put it back on the free list
+  size_t new_length = 0;
+  size_t free_in_megablock = 0;
+  // iterate through segments by megablock
+  for(size_t i = 0; i<length; i+=free_in_megablock) {
+    // count of free segments in the current megablock
+    free_in_megablock = 1;
+    for(;i + free_in_megablock < length; free_in_megablock++) {
+      if (((W_)sorted[i] & ~MBLOCK_MASK) != ((W_)sorted[i + free_in_megablock] & ~MBLOCK_MASK))
+        break;
+    }
+    if (free_in_megablock < BLOCKS_PER_MBLOCK / NONMOVING_SEGMENT_BLOCKS) {
+      // the entire block isn't free so put it back on the list
+      for(size_t j = 0; j < free_in_megablock;j++){
+        struct NonmovingSegment *last = free;
+        free = sorted[i+j];
+        free->link = last;
+        new_length++;
+      }
+    } else {
+      // the megablock is free, so let's free all the segments.
+      ACQUIRE_SM_LOCK;
+      for(size_t j = 0; j < free_in_megablock;j++){
+        bdescr *bd = Bdescr((StgPtr)sorted[i+j]);
+        freeGroup(bd);
+      }
+      RELEASE_SM_LOCK;
+    }
+  }
+  stgFree(sorted);
+  // If we have any segments left over, then put them back on the free list.
+  if(free) {
+    struct NonmovingSegment* tail = free;
+    while(tail->link) {
+      tail = tail->link;
+    }
+    while (true) {
+      struct NonmovingSegment* rest = ACQUIRE_LOAD(&nonmovingHeap.free);
+      tail->link = rest;
+      if (cas((StgVolatilePtr) &nonmovingHeap.free,
+              (StgWord) rest,
+              (StgWord) free) == (StgWord) rest) {
+          __sync_add_and_fetch(&nonmovingHeap.n_free, new_length);
+          break;
+      }
+    }
+  }
+  size_t pruned_segments = length - new_length;
+  // See Note [Live data accounting in nonmoving collector].
+  oldest_gen->n_blocks -= pruned_segments * NONMOVING_SEGMENT_BLOCKS;
+  oldest_gen->n_words  -= pruned_segments * NONMOVING_SEGMENT_SIZE;
+  nonmovingHeap.saved_free = NULL;
+  debugTrace(DEBUG_nonmoving_gc,
+            "Pruned %d free segments, leaving %d on the free segment list.",
+            pruned_segments, new_length);
+  traceNonmovingPrunedSegments(pruned_segments, new_length);
+  trace(TRACE_nonmoving_gc, "Finished pruning free segment list.");
+
+}
+
 void nonmovingInitAllocator(struct NonmovingAllocator* alloc, uint16_t block_size)
 {
   *alloc = (struct NonmovingAllocator)
@@ -1216,6 +1325,7 @@ concurrent_marking:
     nonmovingSweepStableNameTable();
 
     nonmovingSweep();
+    nonmovingPruneFreeSegmentList();
     ASSERT(nonmovingHeap.sweep_list == NULL);
     debugTrace(DEBUG_nonmoving_gc, "Finished sweeping.");
     traceConcSweepEnd();


=====================================
rts/sm/NonMoving.h
=====================================
@@ -119,16 +119,16 @@ extern uint8_t nonmoving_alloca_dense_cnt;
 // NONMOVING_SEGMENT_SIZE (in bytes)
 extern uint8_t nonmoving_alloca_cnt;
 
-// maximum number of free segments to hold on to
-#define NONMOVING_MAX_FREE 16
-
 struct NonmovingHeap {
     struct NonmovingAllocator *allocators;
-    // free segment list. This is a cache where we keep up to
-    // NONMOVING_MAX_FREE segments to avoid thrashing the block allocator.
+    // free segment list. This is a cache where we keep segments
+    // belonging to megablocks that are only partially free.
     // Note that segments in this list are still counted towards
     // oldest_gen->n_blocks.
     struct NonmovingSegment *free;
+    // saved free segment list, so the sanity checker can
+    // see the segments while the free list is being pruned.
+    struct NonmovingSegment *saved_free;
     // how many segments in free segment list? accessed atomically.
     unsigned int n_free;
 
@@ -172,6 +172,7 @@ void nonmovingCollect(StgWeak **dead_weaks,
                       bool concurrent);
 
 void nonmovingPushFreeSegment(struct NonmovingSegment *seg);
+void nonmovingPruneFreeSegmentList(void);
 
 INLINE_HEADER unsigned long log2_ceil(unsigned long x)
 {


=====================================
rts/sm/NonMovingAllocate.c
=====================================
@@ -65,19 +65,42 @@ static struct NonmovingSegment *nonmovingAllocSegment(enum AllocLockMode mode, u
     struct NonmovingSegment *ret;
     ret = nonmovingPopFreeSegment();
 
-    // Nothing in the free list, allocate a new segment...
+    // Nothing in the free list, allocate a new segment.
+    // We allocate a full megablock, and add spare segments to our free list.
     if (ret == NULL) {
         acquire_alloc_lock(mode);
-        bdescr *bd = allocAlignedGroupOnNode(node, NONMOVING_SEGMENT_BLOCKS);
-        // See Note [Live data accounting in nonmoving collector].
-        oldest_gen->n_blocks += bd->blocks;
-        oldest_gen->n_words  += BLOCK_SIZE_W * bd->blocks;
+        // Another thread might have allocated while we were waiting for the lock.
+        ret = nonmovingPopFreeSegment();
+        if (ret != NULL) {
+          release_alloc_lock(mode);
+          // Check alignment
+          ASSERT(((uintptr_t)ret % NONMOVING_SEGMENT_SIZE) == 0);
+          return ret;
+        }
+
+        bdescr *bd = allocMBlockAlignedGroupOnNode(node, NONMOVING_SEGMENT_BLOCKS);
         release_alloc_lock(mode);
 
-        for (StgWord32 i = 0; i < bd->blocks; ++i) {
+        W_ alloc_blocks = BLOCKS_PER_MBLOCK - (BLOCKS_PER_MBLOCK % NONMOVING_SEGMENT_BLOCKS);
+
+        // See Note [Live data accounting in nonmoving collector].
+        oldest_gen->n_blocks += alloc_blocks;
+        oldest_gen->n_words  += BLOCK_SIZE_W * alloc_blocks;
+
+        for (StgWord32 i = 0; i < alloc_blocks; ++i) {
             initBdescr(&bd[i], oldest_gen, oldest_gen);
             bd[i].flags = BF_NONMOVING;
         }
+
+        // Push all but the last segment to the free segment list.
+        while(bd->link) {
+          bdescr *next_bd = bd->link;
+          bd->link = NULL;
+          nonmovingPushFreeSegment((struct NonmovingSegment *)bd->start);
+          bd = next_bd;
+        }
+
+        // Use the last segment to service the allocation.
         ret = (struct NonmovingSegment *)bd->start;
     }
 


=====================================
rts/sm/NonMovingMark.c
=====================================
@@ -919,7 +919,9 @@ static MarkQueueEnt markQueuePop (MarkQueue *q)
         // MarkQueueEnt encoding always places the pointer to the object to be
         // marked first.
         prefetchForRead(&new.mark_closure.p->header.info);
+#if !defined(ASSERTS_ENABLED)
         prefetchForRead(Bdescr((StgPtr) new.mark_closure.p));
+#endif
         q->prefetch_queue[i] = new;
         i = (i + 1) % MARK_PREFETCH_QUEUE_DEPTH;
     }


=====================================
rts/sm/Sanity.c
=====================================
@@ -1238,6 +1238,7 @@ countNonMovingHeap(struct NonmovingHeap *heap)
     }
     ret += countNonMovingSegments(heap->sweep_list);
     ret += countNonMovingSegments(heap->free);
+    ret += countNonMovingSegments(heap->saved_free);
     return ret;
 }
 


=====================================
rts/win32/ConsoleHandler.c
=====================================
@@ -5,6 +5,8 @@
  *       For the WINIO manager see base in the GHC.Event modules.
  */
 #include "Rts.h"
+#include "MIOManager.h"
+#include "ThreadLabels.h"
 #include <windows.h>
 #include "ConsoleHandler.h"
 #include "Schedule.h"


=====================================
rts/win32/ThrIOManager.c
=====================================
@@ -9,6 +9,7 @@
  * ---------------------------------------------------------------------------*/
 
 #include "Rts.h"
+#include "IOManager.h"
 #include "ThrIOManager.h"
 #include "MIOManager.h"
 #include "rts/OSThreads.h"


=====================================
testsuite/driver/testlib.py
=====================================
@@ -713,6 +713,9 @@ def _find_so(lib, directory, in_place):
 
     _find_so("Cabal-syntax-3.11.0.0", path-from-ghc-pkg, True) ==>
     /builds/ghc/ghc/_build/install/lib/ghc-9.11.20240410/lib/x86_64-linux-ghc-9.11.20240410/libHSCabal-syntax-3.11.0.0-inplace-ghc9.11.20240410.so
+
+    For a release build the filename replaces "inplace" for a hash (765d in
+    this case): libHSCabal-3.12.0.0-765d-ghc9.11.20240508.so
     """
 
     # produce the suffix for the CI operating system
@@ -722,11 +725,15 @@ def _find_so(lib, directory, in_place):
     elif config.os == "darwin":
         suffix = "dylib"
 
-    # Most artfacts are of the form foo-inplace, except for the rts.
+    # Most artfacts are of the form foo-inplace, or foo-<hash> for release
+    # builds, except for the rts.
+    # "\d+(\.\d+)+" matches version numbers, such as 3.12.0.0 in the above example
+    # "\w+"         matches the hash on release builds or "inplace", such as 765d in the example
+    # "ghc\S+"      matches the ghc build name: ghc9.11.20240508
     if in_place:
-        to_match = r'libHS{}-\d+(\.\d+)+-inplace-\S+\.' + suffix
+        to_match = r'libHS{}-\d+(\.\d+)+-\w+-ghc\S+\.' + suffix
     else:
-        to_match = r'libHS{}-\d+(\.\d+)+\S+\.' + suffix
+        to_match = r'libHS{}-\d+(\.\d+)+-ghc\S+\.' + suffix
 
     matches = []
     # wrap this in some exception handling, hadrian test will error out because


=====================================
testsuite/tests/perf/size/all.T
=====================================
@@ -72,8 +72,7 @@ test('mtl_so'             ,[req_dynamic_ghc, js_skip, windows_skip, collect_obje
 test('os_string_so'       ,[req_dynamic_ghc, js_skip, windows_skip, collect_object_size(size_acceptance_threshold, "os-string")]        , static_stats, [] )
 test('parsec_so'          ,[req_dynamic_ghc, js_skip, windows_skip, collect_object_size(size_acceptance_threshold, "parsec")]           , static_stats, [] )
 test('process_so'         ,[req_dynamic_ghc, js_skip, windows_skip, collect_object_size(size_acceptance_threshold, "process")]          , static_stats, [] )
-# Disabled as extremely unstable
-#test('rts_so'             ,[req_dynamic_ghc, js_skip, windows_skip, collect_object_size(size_acceptance_threshold, "rts", True)]              , static_stats, [] )
+test('rts_so'             ,[req_dynamic_ghc, js_skip, windows_skip, collect_object_size(size_acceptance_threshold, "rts", True)]              , static_stats, [] )
 test('template_haskell_so',[req_dynamic_ghc, js_skip, windows_skip, collect_object_size(size_acceptance_threshold, "template-haskell")] , static_stats, [] )
 test('terminfo_so'        ,[req_dynamic_ghc, js_skip, windows_skip, collect_object_size(size_acceptance_threshold, "terminfo")]         , static_stats, [] )
 test('text_so'            ,[req_dynamic_ghc, js_skip, windows_skip, collect_object_size(size_acceptance_threshold, "text")]             , static_stats, [] )


=====================================
testsuite/tests/rts/atomicinc.c
=====================================
@@ -11,7 +11,7 @@ int main(int argc, char *argv[])
     CHECK(j == 1);
     CHECK(i == 1);
 
-    j = atomic_dec(&i);
+    j = atomic_dec(&i,1);
     CHECK(j == 0);
     CHECK(i == 0);
 


=====================================
testsuite/tests/simplStg/should_compile/T24806.hs
=====================================
@@ -0,0 +1,19 @@
+module T24806 ( go ) where
+
+data List a = Nil | Cons a !(List a) -- deriving Show
+
+data Tup2 a b = Tup2 !a !b
+
+-- All branches of go return either two properly tagged values *or* are bottom.
+-- This means we should see something like:
+--
+--      (T24806.$wgo, <TagTuple[TagProper, TagProper]>) =
+--
+-- in the dump output.
+-- See Note [Bottom functions are TagTagged] for details why.
+go :: List a1 -> List a2 -> Tup2 (List a2) (List a2)
+go Nil ys = Tup2 ys Nil
+go (Cons _ xs) ys = case ys of
+    Nil -> undefined
+    Cons y ys' -> case go xs ys' of
+        Tup2 s zs -> Tup2 s (Cons y zs)
\ No newline at end of file


=====================================
testsuite/tests/simplStg/should_compile/T24806.stderr
=====================================
@@ -0,0 +1,99 @@
+
+==================== CodeGenAnal STG: ====================
+lvl6 :: GHC.Prim.Addr#
+[GblId, Unf=OtherCon []] =
+    "T24806.hs"#;
+
+lvl4 :: GHC.Prim.Addr#
+[GblId, Unf=OtherCon []] =
+    "T24806"#;
+
+lvl2 :: GHC.Prim.Addr#
+[GblId, Unf=OtherCon []] =
+    "main"#;
+
+lvl :: GHC.Prim.Addr#
+[GblId, Unf=OtherCon []] =
+    "undefined"#;
+
+(T24806.$WTup2, <TagProper>) =
+    {} \r [(conrep, <TagDunno>) (conrep1, <TagDunno>)]
+        case conrep of (conrep2, <TagProper>) {
+        __DEFAULT ->
+        case conrep1 of (conrep3, <TagProper>) {
+        __DEFAULT -> T24806.Tup2 [conrep2 conrep3];
+        };
+        };
+
+(T24806.$WCons, <TagProper>) =
+    {} \r [(conrep, <TagDunno>) (conrep1, <TagDunno>)]
+        case conrep1 of (conrep2, <TagProper>) {
+        __DEFAULT -> T24806.Cons [conrep conrep2];
+        };
+
+(lvl1, <TagDunno>) = {} \u [] GHC.CString.unpackCString# lvl;
+
+(lvl3, <TagDunno>) = {} \u [] GHC.CString.unpackCString# lvl2;
+
+(lvl5, <TagDunno>) = {} \u [] GHC.CString.unpackCString# lvl4;
+
+(lvl7, <TagDunno>) = {} \u [] GHC.CString.unpackCString# lvl6;
+
+(lvl8, <TagProper>) = GHC.Types.I#! [17#];
+
+(lvl9, <TagProper>) = GHC.Types.I#! [12#];
+
+(lvl10, <TagProper>) = GHC.Types.I#! [21#];
+
+(lvl11, <TagProper>) =
+    GHC.Internal.Stack.Types.SrcLoc! [lvl3
+                                      lvl5
+                                      lvl7
+                                      lvl8
+                                      lvl9
+                                      lvl8
+                                      lvl10];
+
+(lvl12, <TagProper>) =
+    GHC.Internal.Stack.Types.PushCallStack! [lvl1
+                                             lvl11
+                                             GHC.Internal.Stack.Types.EmptyCallStack];
+
+(lvl13, <TagDunno>) = {} \u [] GHC.Internal.Err.undefined lvl12;
+
+(T24806.Tup2, <TagDunno>) =
+    {} \r [(eta, <TagDunno>) (eta, <TagDunno>)] T24806.Tup2 [eta eta];
+
+(T24806.Nil, <TagProper>) = T24806.Nil! [];
+
+Rec {
+(T24806.$wgo, <TagTuple[TagProper, TagProper]>) =
+    {} \r [(ds, <TagProper>) (ys, <TagProper>)]
+        case ds of (wild, <TagProper>) {
+          T24806.Nil ->
+              case ys of (conrep, <TagProper>) {
+              __DEFAULT -> (#,#) [conrep T24806.Nil];
+              };
+          T24806.Cons (ds1, <TagDunno>) (xs, <TagProper>) ->
+              case ys of (wild1, <TagProper>) {
+                T24806.Nil -> lvl13;
+                T24806.Cons (y, <TagDunno>) (ys', <TagProper>) ->
+                    case T24806.$wgo xs ys' of (wild2, <TagProper>) {
+                    (#,#) (ww, <TagProper>) (ww1, <TagProper>) ->
+                    let { (sat, <TagProper>) = T24806.Cons! [y ww1];
+                    } in  (#,#) [ww sat];
+                    };
+              };
+        };
+end Rec }
+
+(T24806.go, <TagProper>) =
+    {} \r [(ds, <TagDunno>) (ys, <TagDunno>)]
+        case T24806.$wgo ds ys of (wild, <TagProper>) {
+        (#,#) (ww, <TagProper>) (ww1, <TagProper>) -> T24806.Tup2 [ww ww1];
+        };
+
+(T24806.Cons, <TagDunno>) =
+    {} \r [(eta, <TagDunno>) (eta, <TagDunno>)] T24806.Cons [eta eta];
+
+


=====================================
testsuite/tests/simplStg/should_compile/all.T
=====================================
@@ -23,3 +23,5 @@ test('inferTags003', [ only_ways(['optasm']),
                        grep_errmsg(r'(call stg\_ap\_0)', [1])
                      ], compile, ['-ddump-cmm -dno-typeable-binds -O'])
 test('inferTags004', normal, compile, ['-O -ddump-stg-tags -dno-typeable-binds -dsuppress-uniques'])
+
+test('T24806', grep_errmsg('^\\(T24806\\.\\$wgo'), compile, ['-O -ddump-stg-tags -dno-typeable-binds -dsuppress-uniques'])


=====================================
utils/fs/fs.c
=====================================
@@ -140,7 +140,7 @@ static int setErrNoFromWin32Error (void);
    This function should only be called when the creation of the fd actually
    failed and you want to return -1 for the fd.  */
 static
-int setErrNoFromWin32Error () {
+int setErrNoFromWin32Error (void) {
   switch (GetLastError()) {
     case ERROR_SUCCESS:
       errno = 0;



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/3f33796419f7bd97a57f29c6a7d31f00a1ae6ae2...fdf826a5f69400170deec238e4d61793b0261c2b

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/3f33796419f7bd97a57f29c6a7d31f00a1ae6ae2...fdf826a5f69400170deec238e4d61793b0261c2b
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240517/9f098638/attachment-0001.html>