[Git][ghc/ghc][wip/andreask/9.10-backports-1] 4 commits: GHCi: fix improper location of ghci_history file

Andreas Klebinger (@AndreasK) gitlab at gitlab.haskell.org
Fri Dec 6 20:12:27 UTC 2024



Andreas Klebinger pushed to branch wip/andreask/9.10-backports-1 at Glasgow Haskell Compiler / GHC


Commits:
9bcf1f1c by ur4t at 2024-12-06T17:10:53+01:00
GHCi: fix improper location of ghci_history file

Fixes #24266

(cherry picked from commit 6f0a62db5dc79640433c61e83ea1427665304869)

- - - - -
82616fb0 by Andreas Klebinger at 2024-12-06T17:14:58+01:00
Mention .ghci_history XGD changes in changelog.

- - - - -
25d72bab by Cheng Shao at 2024-12-06T17:18:03+01:00
rts: do not prefetch mark_closure bdescr in non-moving gc when ASSERTS_ENABLED

This commit fixes a small an oversight in !12148: the prefetch logic
in non-moving GC may trap in debug RTS because it calls Bdescr() for
mark_closure which may be a static one. It's fine in non-debug RTS
because even invalid bdescr addresses are prefetched, they will not
cause segfaults, so this commit implements the most straightforward
fix: don't prefetch mark_closure bdescr when assertions are enabled.

(cherry picked from commit 8f74b38148f0d8b9341099cb0edda6f7698a4369)

- - - - -
8958612c by Teo Camarasu at 2024-12-06T17:20:07+01:00
rts: Allocate non-moving segments with megablocks

Non-moving segments are 8 blocks long and need to be aligned.
Previously we serviced allocations by grabbing 15 blocks, finding
an aligned 8 block group in it and returning the rest.
This proved to lead to high levels of fragmentation as a de-allocating a segment
caused an 8 block gap to form, and this could not be reused for allocation.

This patch introduces a segment allocator based around using entire
megablocks to service segment allocations in bulk.

When there are no free segments, we grab an entire megablock and fill it
with aligned segments. As the megablock is free, we can easily guarantee
alignment. Any unused segments are placed on a free list.

It only makes sense to free segments in bulk when all of the segments in
a megablock are freeable. After sweeping, we grab the free list, sort it,
and find all groups of segments where they cover the megablock and free
them.
This introduces a period of time when free segments are not available to
the mutator, but the risk that this would lead to excessive allocation
is low. Right after sweep, we should have an abundance of partially full
segments, and this pruning step is relatively quick.

In implementing this we drop the logic that kept NONMOVING_MAX_FREE
segments on the free list.

We also introduce an eventlog event to log the amount of pruned/retained
free segments.

See Note [Segment allocation strategy]

Resolves #24150

-------------------------
Metric Decrease:
    T13253
    T19695
-------------------------

(cherry picked from commit 5e36e7ced6a8aa183762e1df97da20b460e19bcf)

- - - - -


19 changed files:

- docs/users_guide/9.10.2-notes.rst
- docs/users_guide/eventlog-formats.rst
- ghc/GHCi/UI.hs
- rts/RtsStartup.c
- rts/Trace.c
- rts/Trace.h
- rts/eventlog/EventLog.c
- rts/eventlog/EventLog.h
- rts/gen_event_types.py
- rts/include/rts/storage/Block.h
- rts/include/stg/SMP.h
- rts/sm/BlockAlloc.c
- rts/sm/GC.c
- rts/sm/NonMoving.c
- rts/sm/NonMoving.h
- rts/sm/NonMovingAllocate.c
- rts/sm/NonMovingMark.c
- rts/sm/Sanity.c
- testsuite/tests/rts/atomicinc.c


Changes:

=====================================
docs/users_guide/9.10.2-notes.rst
=====================================
@@ -29,6 +29,7 @@ WebAssembly backend
 GHCi
 ~~~~
 
+- When using XDG directories ``.ghci_history`` now uses the ``$XDG_DATA_HOME`` variable as expected. (#24266)
 
 Runtime system
 ~~~~~~~~~~~~~~
@@ -38,6 +39,9 @@ Runtime system
   Users who have fine-tuned the :rts-flag:`-F ⟨factor⟩`, :rts-flag:`-Fd ⟨factor⟩`, or :rts-flag:`-O ⟨size⟩` flags,
   and use the non-moving GC, should see if adjustments are needed in light of this change.
 
+- Reduce fragmentation incurred by the nonmoving GC's segment allocator. In one application this reduced resident set size by 26%. See :ghc-ticket:`24150`.
+
+
 ``base`` library
 ~~~~~~~~~~~~~~~~
 


=====================================
docs/users_guide/eventlog-formats.rst
=====================================
@@ -855,6 +855,9 @@ A typical non-moving collection cycle will look something like the following:
 12. A :event-type:`NONMOVING_HEAP_CENSUS` event will be emitted describing the
     fragmentation state of the non-moving heap.
 
+13. A :event-type:`NONMOVING_PRUNED_SEGMENTS` event will be emitted showing
+    information about freeing of segments.
+
 
 .. event-type:: CONC_MARK_BEGIN
 
@@ -929,6 +932,17 @@ heap.
 
    Describes the occupancy of the *blk_sz* sub-heap.
 
+.. event-type:: NONMOVING_PRUNED_SEGMENTS
+
+   :tag: 208
+   :length: fixed
+   :field Word32: number of pruned segments.
+   :field Word32: number of segments remaining on the free list.
+
+   Report the amount of segments pruned and those remaining on the nonmoving
+   heap's segment free list. Segments will be retained on the free list until
+   the entire megablock containing them can be freed.
+
 .. _ticky-event-format:
 
 Ticky counters


=====================================
ghc/GHCi/UI.hs
=====================================
@@ -640,30 +640,27 @@ ghciLogAction lastErrLocations old_log_action
             _ -> return ()
         _ -> return ()
 
--- | Takes a file name and prefixes it with the appropriate
--- GHC appdir.
--- Uses ~/.ghc (getAppUserDataDirectory) if it exists
--- If it doesn't, then it uses $XDG_DATA_HOME/ghc
--- Earlier we always used to use ~/.ghc, but we want
--- to gradually move to $XDG_DATA_HOME to respect the XDG specification
---
--- As a migration strategy, we will only create new directories in
--- the appropriate XDG location. However, we will use the old directory
--- if it already exists.
-getAppDataFile :: FilePath -> IO (Maybe FilePath)
-getAppDataFile file = do
-    let new_path = tryIO (getXdgDirectory XdgConfig "ghc") >>= \case
-          Left _ -> pure Nothing
-          Right dir -> flip catchIO (const $ return Nothing) $ do
-            createDirectoryIfMissing False dir
-            pure $ Just $ dir </> file
-
-    e_old_path <- tryIO (getAppUserDataDirectory "ghc")
-    case e_old_path of
-      Right old_path -> doesDirectoryExist old_path >>= \case
-        True -> pure $ Just $ old_path </> file
-        False -> new_path
-      Left _ -> new_path
+-- | Takes a file name and prefixes it with the appropriate GHC appdir.
+-- ~/.ghc (getAppUserDataDirectory) is used if it exists, or XDG directories
+-- are used to respect the XDG specification.
+-- As a migration strategy, currently we will only create new directories in
+-- the appropriate XDG location.
+getAppDataFile :: XdgDirectory -> FilePath -> IO (Maybe FilePath)
+getAppDataFile xdgDir file = do
+  xdgAppDir <-
+    tryIO (getXdgDirectory xdgDir "ghc") >>= \case
+      Left _ -> pure Nothing
+      Right dir -> flip catchIO (const $ pure Nothing) $ do
+        createDirectoryIfMissing False dir
+        pure $ Just dir
+  appDir <-
+    tryIO (getAppUserDataDirectory "ghc") >>= \case
+      Right dir ->
+        doesDirectoryExist dir >>= \case
+          True -> pure $ Just dir
+          False -> pure xdgAppDir
+      Left _ -> pure xdgAppDir
+  pure $ appDir >>= \dir -> Just $ dir </> file
 
 runGHCi :: [(FilePath, Maybe UnitId, Maybe Phase)] -> Maybe [String] -> GHCi ()
 runGHCi paths maybe_exprs = do
@@ -671,13 +668,12 @@ runGHCi paths maybe_exprs = do
   let
    ignore_dot_ghci = gopt Opt_IgnoreDotGhci dflags
 
-   app_user_dir = liftIO $ getAppDataFile "ghci.conf"
+   appDataCfg = liftIO $ getAppDataFile XdgConfig "ghci.conf"
 
-   home_dir = do
-    either_dir <- liftIO $ tryIO (getEnv "HOME")
-    case either_dir of
-      Right home -> return (Just (home </> ".ghci"))
-      _ -> return Nothing
+   homeCfg = do
+    liftIO $ tryIO (getEnv "HOME") >>= \case
+      Right home -> pure $ Just $ home </> ".ghci"
+      _ -> pure Nothing
 
    canonicalizePath' :: FilePath -> IO (Maybe FilePath)
    canonicalizePath' fp = liftM Just (canonicalizePath fp)
@@ -711,7 +707,7 @@ runGHCi paths maybe_exprs = do
     then pure []
     else do
       userCfgs <- do
-        paths <- catMaybes <$> sequence [ app_user_dir, home_dir ]
+        paths <- catMaybes <$> sequence [ appDataCfg, homeCfg ]
         checkedPaths <- liftIO $ filterM checkFileAndDirPerms paths
         liftIO . fmap (nub . catMaybes) $ mapM canonicalizePath' checkedPaths
 
@@ -798,12 +794,12 @@ runGHCiInput f = do
     dflags <- getDynFlags
     let ghciHistory = gopt Opt_GhciHistory dflags
     let localGhciHistory = gopt Opt_LocalGhciHistory dflags
-    currentDirectory <- liftIO $ getCurrentDirectory
+    currentDirectory <- liftIO getCurrentDirectory
 
     histFile <- case (ghciHistory, localGhciHistory) of
-      (True, True) -> return (Just (currentDirectory </> ".ghci_history"))
-      (True, _) -> liftIO $ getAppDataFile "ghci_history"
-      _ -> return Nothing
+      (True, True) -> pure $ Just $ currentDirectory </> ".ghci_history"
+      (True, _) -> liftIO $ getAppDataFile XdgData "ghci_history"
+      _ -> pure Nothing
 
     runInputT
         (setComplete ghciCompleteWord $ defaultSettings {historyFile = histFile})


=====================================
rts/RtsStartup.c
=====================================
@@ -454,7 +454,7 @@ hs_exit_(bool wait_foreign)
     uint32_t g, i;
 
     // N.B. atomic_dec returns the new value.
-    StgInt init_count = (StgInt)atomic_dec(&hs_init_count);
+    StgInt init_count = (StgInt)atomic_dec(&hs_init_count, 1);
     if (init_count > 0) {
         // ignore until it's the last one
         return;


=====================================
rts/Trace.c
=====================================
@@ -926,6 +926,12 @@ void traceNonmovingHeapCensus(uint16_t blk_size,
         postNonmovingHeapCensus(blk_size, census);
 }
 
+void traceNonmovingPrunedSegments(uint32_t pruned_segments, uint32_t free_segments)
+{
+    if (eventlog_enabled && TRACE_nonmoving_gc)
+        postNonmovingPrunedSegments(pruned_segments, free_segments);
+}
+
 void traceThreadStatus_ (StgTSO *tso USED_IF_DEBUG)
 {
 #if defined(DEBUG)


=====================================
rts/Trace.h
=====================================
@@ -331,6 +331,7 @@ void traceConcSweepEnd(void);
 void traceConcUpdRemSetFlush(Capability *cap);
 void traceNonmovingHeapCensus(uint16_t blk_size,
                               const struct NonmovingAllocCensus *census);
+void traceNonmovingPrunedSegments(uint32_t pruned_segments, uint32_t free_segments);
 
 void traceIPE(const InfoProvEnt *ipe);
 void flushTrace(void);


=====================================
rts/eventlog/EventLog.c
=====================================
@@ -1154,6 +1154,15 @@ void postNonmovingHeapCensus(uint16_t blk_size,
     RELEASE_LOCK(&eventBufMutex);
 }
 
+void postNonmovingPrunedSegments(uint32_t pruned_segments, uint32_t free_segments)
+{
+    ACQUIRE_LOCK(&eventBufMutex);
+    postEventHeader(&eventBuf, EVENT_NONMOVING_PRUNED_SEGMENTS);
+    postWord32(&eventBuf, pruned_segments);
+    postWord32(&eventBuf, free_segments);
+    RELEASE_LOCK(&eventBufMutex);
+}
+
 void closeBlockMarker (EventsBuf *ebuf)
 {
     if (ebuf->marker)


=====================================
rts/eventlog/EventLog.h
=====================================
@@ -196,6 +196,7 @@ void postConcUpdRemSetFlush(Capability *cap);
 void postConcMarkEnd(StgWord32 marked_obj_count);
 void postNonmovingHeapCensus(uint16_t blk_size,
                              const struct NonmovingAllocCensus *census);
+void postNonmovingPrunedSegments(uint32_t pruned_segments, uint32_t free_segments);
 
 #if defined(TICKY_TICKY)
 void postTickyCounterDefs(StgEntCounter *p);


=====================================
rts/gen_event_types.py
=====================================
@@ -132,6 +132,7 @@ event_types = [
     EventType(205, 'CONC_SWEEP_END',               [],                    'End concurrent sweep phase'),
     EventType(206, 'CONC_UPD_REM_SET_FLUSH',       [CapNo],               'Update remembered set flushed'),
     EventType(207, 'NONMOVING_HEAP_CENSUS',        [Word16, Word32, Word32, Word32], 'Nonmoving heap census'),
+    EventType(208, 'NONMOVING_PRUNED_SEGMENTS',    [Word32, Word32],      'Report the amount of segments pruned and remaining on the free list.'),
 
     # Ticky-ticky profiling
     EventType(210, 'TICKY_COUNTER_DEF',            VariableLength,        'Ticky-ticky entry counter definition'),


=====================================
rts/include/rts/storage/Block.h
=====================================
@@ -323,6 +323,10 @@ bdescr *allocGroupOnNode(uint32_t node, W_ n);
 //
 bdescr *allocAlignedGroupOnNode(uint32_t node, W_ n);
 
+// Allocate a MBlock worth of `n` block sized chunks aligned at `n`-block boundry.
+// This returns a linked list of `bdescr` of length `BLOCKS_PER_MBLOCK / n`.
+bdescr *allocMBlockAlignedGroupOnNode(uint32_t node, W_ n);
+
 EXTERN_INLINE bdescr* allocBlockOnNode(uint32_t node);
 EXTERN_INLINE bdescr* allocBlockOnNode(uint32_t node)
 {


=====================================
rts/include/stg/SMP.h
=====================================
@@ -82,13 +82,13 @@ EXTERN_INLINE StgWord cas_seq_cst_relaxed(StgVolatilePtr p, StgWord o, StgWord n
 EXTERN_INLINE StgWord atomic_inc(StgVolatilePtr p, StgWord n);
 
 /*
- * Atomic decrement
+ * Atomic subtraction by the provided quantity.
  *
- * atomic_dec(p) {
- *   return --(*p);
+ * atomic_dec(p, n) {
+ *   return ((*p) -= n);
  * }
  */
-EXTERN_INLINE StgWord atomic_dec(StgVolatilePtr p);
+EXTERN_INLINE StgWord atomic_dec(StgVolatilePtr p, StgWord n);
 
 /*
  * Busy-wait nop: this is a hint to the CPU that we are currently in a
@@ -479,9 +479,9 @@ atomic_inc(StgVolatilePtr p, StgWord incr)
 }
 
 EXTERN_INLINE StgWord
-atomic_dec(StgVolatilePtr p)
+atomic_dec(StgVolatilePtr p, StgWord decr)
 {
-    return __atomic_sub_fetch(p, 1, __ATOMIC_SEQ_CST);
+    return __atomic_sub_fetch(p, decr, __ATOMIC_SEQ_CST);
 }
 
 /*
@@ -628,9 +628,9 @@ atomic_inc(StgVolatilePtr p, StgWord incr)
 }
 
 INLINE_HEADER StgWord
-atomic_dec(StgVolatilePtr p)
+atomic_dec(StgVolatilePtr p, StgWord decr)
 {
-    return --(*p);
+    return ((*p) -= decr);
 }
 #endif
 


=====================================
rts/sm/BlockAlloc.c
=====================================
@@ -394,6 +394,53 @@ split_block_low (bdescr *bd, W_ n)
 }
 
 
+// A variant of `split_block_high` where we keep both blocks.
+// The returned block has size `n`, which is split off `bd`.
+static bdescr *
+split_block_high_no_free (bdescr *bd, W_ n)
+{
+    ASSERT(bd->blocks > n);
+
+    bdescr* ret = bd + bd->blocks - n; // take n blocks off the end
+    ret->blocks = n;
+    ret->start = ret->free = bd->start + (bd->blocks - n)*BLOCK_SIZE_W;
+    ret->link = NULL;
+
+    bd->blocks -= n;
+
+    setup_tail(ret);
+    setup_tail(bd);
+
+    return ret;
+}
+
+// Allocate a MBlock worth of `n` block sized chunks aligned at `n`-block boundry.
+// This returns a linked list of `bdescr` of length `BLOCKS_PER_MBLOCK / n`.
+// We assume relevant locks are held.
+bdescr *
+allocMBlockAlignedGroupOnNode(uint32_t node, W_ n)
+{
+    bdescr *bd = allocGroupOnNode(node, BLOCKS_PER_MBLOCK);
+
+    // Free unaligned blocks, as we can't use these.
+    ASSERT(bd->blocks == BLOCKS_PER_MBLOCK);
+    bd = split_block_high(bd, bd->blocks - bd->blocks % n);
+    ASSERT(bd->blocks % n == 0);
+
+    bdescr *last = NULL;
+    bdescr *chunk = NULL;
+    // Chain the aligned groups together into a linked-list
+    while (bd->blocks > n) {
+      chunk = split_block_high_no_free(bd, n);
+      chunk->link = last;
+      last = chunk;
+    }
+    bd->link = chunk;
+
+    return bd;
+}
+
+
 /* Find a fitting block for the allocation request in the given free list.
    Returns:
      - not NULL: when an exact match was found in the free list.


=====================================
rts/sm/GC.c
=====================================
@@ -1318,7 +1318,7 @@ dec_running (void)
     ACQUIRE_LOCK(&gc_running_mutex);
 #endif
 
-    StgWord r = atomic_dec(&gc_running_threads);
+    StgWord r = atomic_dec(&gc_running_threads, 1);
 
 #if defined(THREADED_RTS)
     if (r == 0) {


=====================================
rts/sm/NonMoving.c
=====================================
@@ -248,6 +248,8 @@ static void nonmovingBumpEpoch(void) {
  *
  *  - Note [Allocator sizes] goes into detail about our choice of allocator sizes.
  *
+ *  - Note [Segment allocation strategy] explains our segment allocation strategy.
+ *
  * [ueno 2016]:
  *   Katsuhiro Ueno and Atsushi Ohori. 2016. A fully concurrent garbage
  *   collector for functional programs on multicore processors. SIGPLAN Not. 51,
@@ -560,6 +562,22 @@ static void nonmovingBumpEpoch(void) {
  *
  * See #23340
  *
+ * Note [Segment allocation strategy]
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * Non-moving segments must be aligned. In order, to efficiently service these
+ * allocations, we allocate segments in bulk
+ * We allocate an entire megablocks worth of segments at once.
+ * All unused segments are placed on the `nonmovingHeap.free` list.
+ *
+ * Symmetrically we only de-allocate segments if all the segments in a megablock are free-able, ie,
+ * are on `nonmovingHeap.free`. We prune the free list in `nonmovingPruneFreeSegmentList`,
+ * called during concurrent sweep phase.
+ * Note that during pruning of the free list, free segments are not available for use by the
+ * mutator. This might lead to extra allocation of segments. But the risk is low as just after sweep
+ * there is usually a large amount of partially full segments, and pruning the free list is quite
+ * quick.
+ *
+ * See #24150
  */
 
 memcount nonmoving_segment_live_words = 0;
@@ -578,19 +596,6 @@ static void nonmovingExitConcurrentWorker(void);
 // Add a segment to the free list.
 void nonmovingPushFreeSegment(struct NonmovingSegment *seg)
 {
-    // See Note [Live data accounting in nonmoving collector].
-    if (RELAXED_LOAD(&nonmovingHeap.n_free) > NONMOVING_MAX_FREE) {
-        bdescr *bd = Bdescr((StgPtr) seg);
-        ACQUIRE_SM_LOCK;
-        ASSERT(oldest_gen->n_blocks >= bd->blocks);
-        ASSERT(oldest_gen->n_words >= BLOCK_SIZE_W * bd->blocks);
-        oldest_gen->n_blocks -= bd->blocks;
-        oldest_gen->n_words  -= BLOCK_SIZE_W * bd->blocks;
-        freeGroup(bd);
-        RELEASE_SM_LOCK;
-        return;
-    }
-
     SET_SEGMENT_STATE(seg, FREE);
     while (true) {
         struct NonmovingSegment *old = nonmovingHeap.free;
@@ -601,6 +606,110 @@ void nonmovingPushFreeSegment(struct NonmovingSegment *seg)
     __sync_add_and_fetch(&nonmovingHeap.n_free, 1);
 }
 
+static int
+cmp_segment_ptr (const void *x, const void *y)
+{
+    const struct NonMovingSegment *p1 = *(const struct NonMovingSegment**)x;
+    const struct NonMovingSegment *p2 = *(const struct NonMovingSegment**)y;
+    if (p1 > p2) return +1;
+    else if (p1 < p2) return -1;
+    else return 0;
+}
+
+// Prune the free list of segments that can be freed.
+// Segments can be freed if all segments from a mblock are on the free list.
+void nonmovingPruneFreeSegmentList(void)
+{
+  trace(TRACE_nonmoving_gc, "Pruning free segment list.");
+  // Atomically grab the entire free list.
+  struct NonmovingSegment *free;
+  size_t length;
+  while (true) {
+    free = ACQUIRE_LOAD(&nonmovingHeap.free);
+    length = ACQUIRE_LOAD(&nonmovingHeap.n_free);
+    if (cas((StgVolatilePtr) &nonmovingHeap.free,
+            (StgWord) free,
+            (StgWord) NULL) == (StgWord) free) {
+        atomic_dec((StgVolatilePtr) &nonmovingHeap.n_free, length);
+        break;
+    }
+    // Save the current free list so the sanity checker can see these segments.
+    nonmovingHeap.saved_free = free;
+  }
+
+  // Sort the free list by address.
+  struct NonmovingSegment **sorted = stgMallocBytes(sizeof(struct NonmovingSegment*) * length, "sorted free segment list");
+  for(size_t i = 0; i<length; i++) {
+    sorted[i] = free;
+    free = free->link;
+  }
+  // we should have reached the end of the free list
+  ASSERT(free == NULL);
+
+  qsort(sorted, length, sizeof(struct NonmovingSegment*), cmp_segment_ptr);
+
+  // Walk the sorted list and either:
+  // - free segments if the entire megablock is free
+  // - put it back on the free list
+  size_t new_length = 0;
+  size_t free_in_megablock = 0;
+  // iterate through segments by megablock
+  for(size_t i = 0; i<length; i+=free_in_megablock) {
+    // count of free segments in the current megablock
+    free_in_megablock = 1;
+    for(;i + free_in_megablock < length; free_in_megablock++) {
+      if (((W_)sorted[i] & ~MBLOCK_MASK) != ((W_)sorted[i + free_in_megablock] & ~MBLOCK_MASK))
+        break;
+    }
+    if (free_in_megablock < BLOCKS_PER_MBLOCK / NONMOVING_SEGMENT_BLOCKS) {
+      // the entire block isn't free so put it back on the list
+      for(size_t j = 0; j < free_in_megablock;j++){
+        struct NonmovingSegment *last = free;
+        free = sorted[i+j];
+        free->link = last;
+        new_length++;
+      }
+    } else {
+      // the megablock is free, so let's free all the segments.
+      ACQUIRE_SM_LOCK;
+      for(size_t j = 0; j < free_in_megablock;j++){
+        bdescr *bd = Bdescr((StgPtr)sorted[i+j]);
+        freeGroup(bd);
+      }
+      RELEASE_SM_LOCK;
+    }
+  }
+  stgFree(sorted);
+  // If we have any segments left over, then put them back on the free list.
+  if(free) {
+    struct NonmovingSegment* tail = free;
+    while(tail->link) {
+      tail = tail->link;
+    }
+    while (true) {
+      struct NonmovingSegment* rest = ACQUIRE_LOAD(&nonmovingHeap.free);
+      tail->link = rest;
+      if (cas((StgVolatilePtr) &nonmovingHeap.free,
+              (StgWord) rest,
+              (StgWord) free) == (StgWord) rest) {
+          __sync_add_and_fetch(&nonmovingHeap.n_free, new_length);
+          break;
+      }
+    }
+  }
+  size_t pruned_segments = length - new_length;
+  // See Note [Live data accounting in nonmoving collector].
+  oldest_gen->n_blocks -= pruned_segments * NONMOVING_SEGMENT_BLOCKS;
+  oldest_gen->n_words  -= pruned_segments * NONMOVING_SEGMENT_SIZE;
+  nonmovingHeap.saved_free = NULL;
+  debugTrace(DEBUG_nonmoving_gc,
+            "Pruned %d free segments, leaving %d on the free segment list.",
+            pruned_segments, new_length);
+  traceNonmovingPrunedSegments(pruned_segments, new_length);
+  trace(TRACE_nonmoving_gc, "Finished pruning free segment list.");
+
+}
+
 void nonmovingInitAllocator(struct NonmovingAllocator* alloc, uint16_t block_size)
 {
   *alloc = (struct NonmovingAllocator)
@@ -1216,6 +1325,7 @@ concurrent_marking:
     nonmovingSweepStableNameTable();
 
     nonmovingSweep();
+    nonmovingPruneFreeSegmentList();
     ASSERT(nonmovingHeap.sweep_list == NULL);
     debugTrace(DEBUG_nonmoving_gc, "Finished sweeping.");
     traceConcSweepEnd();


=====================================
rts/sm/NonMoving.h
=====================================
@@ -119,16 +119,16 @@ extern uint8_t nonmoving_alloca_dense_cnt;
 // NONMOVING_SEGMENT_SIZE (in bytes)
 extern uint8_t nonmoving_alloca_cnt;
 
-// maximum number of free segments to hold on to
-#define NONMOVING_MAX_FREE 16
-
 struct NonmovingHeap {
     struct NonmovingAllocator *allocators;
-    // free segment list. This is a cache where we keep up to
-    // NONMOVING_MAX_FREE segments to avoid thrashing the block allocator.
+    // free segment list. This is a cache where we keep segments
+    // belonging to megablocks that are only partially free.
     // Note that segments in this list are still counted towards
     // oldest_gen->n_blocks.
     struct NonmovingSegment *free;
+    // saved free segment list, so the sanity checker can
+    // see the segments while the free list is being pruned.
+    struct NonmovingSegment *saved_free;
     // how many segments in free segment list? accessed atomically.
     unsigned int n_free;
 
@@ -172,6 +172,7 @@ void nonmovingCollect(StgWeak **dead_weaks,
                       bool concurrent);
 
 void nonmovingPushFreeSegment(struct NonmovingSegment *seg);
+void nonmovingPruneFreeSegmentList(void);
 
 INLINE_HEADER unsigned long log2_ceil(unsigned long x)
 {


=====================================
rts/sm/NonMovingAllocate.c
=====================================
@@ -65,19 +65,42 @@ static struct NonmovingSegment *nonmovingAllocSegment(enum AllocLockMode mode, u
     struct NonmovingSegment *ret;
     ret = nonmovingPopFreeSegment();
 
-    // Nothing in the free list, allocate a new segment...
+    // Nothing in the free list, allocate a new segment.
+    // We allocate a full megablock, and add spare segments to our free list.
     if (ret == NULL) {
         acquire_alloc_lock(mode);
-        bdescr *bd = allocAlignedGroupOnNode(node, NONMOVING_SEGMENT_BLOCKS);
-        // See Note [Live data accounting in nonmoving collector].
-        oldest_gen->n_blocks += bd->blocks;
-        oldest_gen->n_words  += BLOCK_SIZE_W * bd->blocks;
+        // Another thread might have allocated while we were waiting for the lock.
+        ret = nonmovingPopFreeSegment();
+        if (ret != NULL) {
+          release_alloc_lock(mode);
+          // Check alignment
+          ASSERT(((uintptr_t)ret % NONMOVING_SEGMENT_SIZE) == 0);
+          return ret;
+        }
+
+        bdescr *bd = allocMBlockAlignedGroupOnNode(node, NONMOVING_SEGMENT_BLOCKS);
         release_alloc_lock(mode);
 
-        for (StgWord32 i = 0; i < bd->blocks; ++i) {
+        W_ alloc_blocks = BLOCKS_PER_MBLOCK - (BLOCKS_PER_MBLOCK % NONMOVING_SEGMENT_BLOCKS);
+
+        // See Note [Live data accounting in nonmoving collector].
+        oldest_gen->n_blocks += alloc_blocks;
+        oldest_gen->n_words  += BLOCK_SIZE_W * alloc_blocks;
+
+        for (StgWord32 i = 0; i < alloc_blocks; ++i) {
             initBdescr(&bd[i], oldest_gen, oldest_gen);
             bd[i].flags = BF_NONMOVING;
         }
+
+        // Push all but the last segment to the free segment list.
+        while(bd->link) {
+          bdescr *next_bd = bd->link;
+          bd->link = NULL;
+          nonmovingPushFreeSegment((struct NonmovingSegment *)bd->start);
+          bd = next_bd;
+        }
+
+        // Use the last segment to service the allocation.
         ret = (struct NonmovingSegment *)bd->start;
     }
 


=====================================
rts/sm/NonMovingMark.c
=====================================
@@ -919,7 +919,9 @@ static MarkQueueEnt markQueuePop (MarkQueue *q)
         // MarkQueueEnt encoding always places the pointer to the object to be
         // marked first.
         prefetchForRead(&new.mark_closure.p->header.info);
+#if !defined(ASSERTS_ENABLED)
         prefetchForRead(Bdescr((StgPtr) new.mark_closure.p));
+#endif
         q->prefetch_queue[i] = new;
         i = (i + 1) % MARK_PREFETCH_QUEUE_DEPTH;
     }


=====================================
rts/sm/Sanity.c
=====================================
@@ -1238,6 +1238,7 @@ countNonMovingHeap(struct NonmovingHeap *heap)
     }
     ret += countNonMovingSegments(heap->sweep_list);
     ret += countNonMovingSegments(heap->free);
+    ret += countNonMovingSegments(heap->saved_free);
     return ret;
 }
 


=====================================
testsuite/tests/rts/atomicinc.c
=====================================
@@ -11,7 +11,7 @@ int main(int argc, char *argv[])
     CHECK(j == 1);
     CHECK(i == 1);
 
-    j = atomic_dec(&i);
+    j = atomic_dec(&i,1);
     CHECK(j == 0);
     CHECK(i == 0);
 



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/6986a1ef635b1a42761b50fdf490b0ba99f99f94...8958612c274531bfe8fd3a033061586610f2c574

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/6986a1ef635b1a42761b50fdf490b0ba99f99f94...8958612c274531bfe8fd3a033061586610f2c574
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20241206/7cc62893/attachment-0001.html>


More information about the ghc-commits mailing list