[Git][ghc/ghc][wip/T23397] rts: use live words to estimate heap size

Teo Camarasu (@teo) gitlab at gitlab.haskell.org
Thu Feb 1 13:36:32 UTC 2024



Teo Camarasu pushed to branch wip/T23397 at Glasgow Haskell Compiler / GHC


Commits:
3e52b5df by Teo Camarasu at 2024-02-01T13:36:16+00:00
rts: use live words to estimate heap size

We use live words rather than live blocks to determine the size of the
heap for determining memory retention.

Most of the time these two metrics align, but they can come apart in
normal usage when using the nonmoving collector.

The nonmoving collector leads to a lot of partially occupied blocks. So,
using live words is more accurate.

They can also come apart when the heap is suffering from high levels
fragmentation caused by small pinned objects, but in this case, the
block size is the more accurate metric. Since this case is best avoided
anyway. It is ok to accept the trade-off that we might try (and
probably) fail to return more memory in this case.

See also the Note [Statistics for retaining memory]

Resolves #23397

- - - - -


5 changed files:

- docs/users_guide/9.10.1-notes.rst
- rts/sm/GC.c
- rts/sm/Storage.c
- rts/sm/Storage.h
- + testsuite/tests/rts/T23397.hs


Changes:

=====================================
docs/users_guide/9.10.1-notes.rst
=====================================
@@ -140,6 +140,11 @@ Runtime system
   See :ghc-ticket:`23340`.
   :rts-flag:`--nonmoving-dense-allocator-count=⟨count⟩` has been added to fine-tune this behaviour.
 
+- Memory return logic now uses live bytes rather than live blocks to measure the size of the heap.
+  This primarily affects the non-moving GC, which should now be more willing to return memory to the OS.
+  Users who have fine-tuned the :rts-flag:`-F ⟨factor⟩`, :rts-flag:`-Fd ⟨factor⟩`, or :rts-flag:`-O ⟨size⟩` flags,
+  and use the non-moving GC, should see if adjustments are needed in light of this change.
+
 - Add a :rts-flag:`--no-automatic-time-samples` flag which stops time profiling samples being automatically started on
   startup. Time profiling can be controlled manually using functions in ``GHC.Profiling``.
 


=====================================
rts/sm/GC.c
=====================================
@@ -1002,10 +1002,15 @@ GarbageCollect (struct GcConfig config,
       need_copied_live = 0;
       need_uncopied_live = 0;
       for (i = 0; i < RtsFlags.GcFlags.generations; i++) {
-          need_copied_live += genLiveCopiedBlocks(&generations[i]);
-          need_uncopied_live += genLiveUncopiedBlocks(&generations[i]);
+          need_copied_live += genLiveCopiedWords(&generations[i]);
+          need_uncopied_live += genLiveUncopiedWords(&generations[i]);
       }
 
+      // Convert the live words into live blocks
+      // See Note [Statistics for retaining memory]
+      need_copied_live = BLOCK_ROUND_UP(need_copied_live) / BLOCK_SIZE_W;
+      need_uncopied_live = BLOCK_ROUND_UP(need_uncopied_live) / BLOCK_SIZE_W;
+
       debugTrace(DEBUG_gc, "(before) copied_live: %d; uncopied_live: %d", need_copied_live, need_uncopied_live );
 
 
@@ -1029,7 +1034,7 @@ GarbageCollect (struct GcConfig config,
 
       ASSERT(need_uncopied_live + need_copied_live >= RtsFlags.GcFlags.minOldGenSize );
 
-      debugTrace(DEBUG_gc, "(after) copyied_live: %d; uncopied_live: %d", need_copied_live, need_uncopied_live );
+      debugTrace(DEBUG_gc, "(after) copied_live: %d; uncopied_live: %d", need_copied_live, need_uncopied_live );
 
       need_prealloc = 0;
       for (i = 0; i < n_nurseries; i++) {
@@ -1067,7 +1072,7 @@ GarbageCollect (struct GcConfig config,
 
       W_ scaled_needed = ((scaled_factor + unavoidable_copied_need_factor) * need_copied_live)
                        + ((scaled_factor + unavoidable_uncopied_need_factor) * need_uncopied_live);
-      debugTrace(DEBUG_gc, "factors_2: %f %d", ((scaled_factor + unavoidable_copied_need_factor) * need_copied_live), ((scaled_factor + unavoidable_uncopied_need_factor) * need_uncopied_live));
+      debugTrace(DEBUG_gc, "factors_2: %f %f", ((scaled_factor + unavoidable_copied_need_factor) * need_copied_live), ((scaled_factor + unavoidable_uncopied_need_factor) * need_uncopied_live));
       need = need_prealloc + scaled_needed;
 
       /* Also, if user set heap size, do not drop below it.
@@ -2422,3 +2427,39 @@ bool doIdleGCWork(Capability *cap STG_UNUSED, bool all)
  *
 
 */
+
+/* Note [Statistics for retaining memory]
+*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+*
+* At the end of GC, we want to determine the size of the heap in order to
+* determine the amount of memory we wish to return to the OS, or if we want
+* to increase the heap size to the minimum.
+*
+* There's two promising candidates for this metric: live words, and live blocks.
+*
+* Measuring live blocks is promising because blocks are the smallest unit
+* that the storage manager can (de)allocate.
+* Most of the time live words and live blocks are very similar.
+*
+* But the two metrics can come apart when the heap is dominated
+* by small pinned objects, or when using the non-moving collector.
+*
+* In both cases, this happens because objects cannot be copied, so
+* block occupancy can fall as objects in a block become garbage.
+* In situations like this, using live blocks to determine memory
+* retention behaviour can lead to us being overly conservative.
+*
+* Instead we use live words rounded up to the block size to measure
+* heap size. This gives us a more accurate picture of the heap.
+*
+* This works particularly well with the nonmoving collector as we
+* can reuse the space taken up by dead heap objects. This choice is less good
+* for fragmentation caused by a few pinned objects retaining blocks.
+* In that case, the block can only be reused if it is deallocated in its entirety.
+* And therefore using live blocks would be more accurate in this case.
+* We assume that this is relatively rare and when it does happen,
+* this fragmentation is a problem that should be addressed in its own right.
+*
+* See: #23397
+*
+*/


=====================================
rts/sm/Storage.c
=====================================
@@ -1642,11 +1642,36 @@ W_ countOccupied (bdescr *bd)
     return words;
 }
 
-// Returns the total number of live blocks
+// Returns the total number of live words
 W_ genLiveWords (generation *gen)
 {
-    return (gen->live_estimate ? gen->live_estimate : gen->n_words) +
-        gen->n_large_words + gen->n_compact_blocks * BLOCK_SIZE_W;
+    return genLiveCopiedWords(gen) + genLiveUncopiedWords(gen);
+}
+
+// The number of live words which will be copied by the copying collector.
+W_ genLiveCopiedWords (generation *gen)
+{
+  if (gen == oldest_gen && RtsFlags.GcFlags.useNonmoving){
+    // the non-moving generation doesn't contain any copied data
+    return 0;
+  } else {
+    return gen->live_estimate ? gen->live_estimate : gen->n_words;
+  }
+}
+
+// The number of live words which will not be copied by the copying collector
+// This includes data living in non-moving collector segments, compact blocks and large/pinned blocks.
+W_ genLiveUncopiedWords(generation *gen)
+{
+  W_ nonmoving_blocks = 0;
+  // The nonmoving heap contains some blocks that live outside the regular generation structure.
+  if (gen == oldest_gen && RtsFlags.GcFlags.useNonmoving){
+    nonmoving_blocks =
+        (gen->live_estimate ? gen->live_estimate : gen->n_words)
+      + nonmoving_large_words
+      + nonmoving_compact_words;
+  }
+  return gen->n_large_words + gen->n_compact_blocks * BLOCK_SIZE_W + nonmoving_blocks;
 }
 
 // The number of live blocks which will be copied by the copying collector.


=====================================
rts/sm/Storage.h
=====================================
@@ -111,6 +111,8 @@ StgWord gcThreadLiveWords  (uint32_t i, uint32_t g);
 StgWord gcThreadLiveBlocks (uint32_t i, uint32_t g);
 
 StgWord genLiveWords  (generation *gen);
+StgWord genLiveCopiedWords (generation *gen);
+StgWord genLiveUncopiedWords (generation *gen);
 StgWord genLiveBlocks (generation *gen);
 StgWord genLiveCopiedBlocks (generation *gen);
 StgWord genLiveUncopiedBlocks (generation *gen);


=====================================
testsuite/tests/rts/T23397.hs
=====================================
@@ -0,0 +1,79 @@
+{-# LANGUAGE MagicHash, UnboxedTuples, NumericUnderscores, ForeignFunctionInterface #-}
+
+module Main where
+
+import GHC.Exts
+import GHC.IO
+import System.Mem
+import System.Environment
+import Debug.Trace
+import Control.Monad
+import GHC.Stats
+import Data.Word
+import GHC.Stack (HasCallStack)
+import Control.DeepSeq
+
+foreign import ccall "performBlockingMajorGC" performBlockingMajorGC :: IO ()
+
+-- Ensure that when using the non-moving collector that
+-- our memory return behaviour reflects the amount of live data, not the live blocks.
+-- The nature of the non-moving collector means that often we have lots of unused live blocks
+-- that surpass the live data.
+-- So, going off live blocks would mean we hold onto too much memory.
+--
+-- In this test, we create a situation where live data is a small subset of live blocks.
+-- Then we allocate some data into the oldest generation to increase its size and then
+-- examine how much memory is returned when it is freed.
+--
+-- In order to make this easier to reproduce in a test we use a high -F value
+main :: IO ()
+main = do
+  [sn] <- getArgs
+  let n = read sn
+  -- create a list of Integers and promote it to the oldest gen
+  xs <- evaluate $ force $ [0..(n :: Integer)]
+  performBlockingMajorGC
+  performBlockingMajorGC
+  let
+    prune (x:xs) = x:prune (drop 1000 xs)
+    prune [] = []
+  -- now delete most of it. This should make the heap like swiss cheese
+  xs' <- evaluate $ force $ prune xs
+  performBlockingMajorGC
+  performBlockingMajorGC
+  -- allocate a bunch
+  xss <- traverse newByteArray $ replicate (fromIntegral n * 8) 60
+  performBlockingMajorGC
+  performBlockingMajorGC
+  evaluate xss
+  -- and free it
+  performBlockingMajorGC
+  performBlockingMajorGC
+  performBlockingMajorGC
+  evaluate xs'
+  stats <- getRTSStats
+  performBlockingMajorGC
+  performBlockingMajorGC
+  -- now let's check that our megablock usage is reflective of our live data
+  stats' <- getRTSStats
+  let live = fromIntegral . gcdetails_live_bytes $ gc stats
+  let penultimate_mem_usage = fromIntegral . gcdetails_mem_in_use_bytes $ gc stats
+  let ultimate_mem_usage = fromIntegral . gcdetails_mem_in_use_bytes $ gc stats'
+  let mem_usage = penultimate_mem_usage - ultimate_mem_usage -- discount memory that we can't free
+  let scale = 8 + 1.2
+
+  -- print live
+  -- print $ floor penultimate_mem_usage
+  -- print $ floor mem_usage
+  -- print $ live * scale
+  unless (live * scale < mem_usage) $
+    error $ "expected " ++ show live ++ " bytes memory used but got " ++ show mem_usage ++ " instead"
+
+  pure ()
+
+data BA = BA ByteArray#
+
+newByteArray :: Int -> IO BA
+newByteArray (I# sz#) = IO $ \s -> case newByteArray# sz# s of
+    (# s', k #) -> case unsafeFreezeByteArray# k s' of
+                    (# s'', ba# #) -> (# s'', BA ba# #)



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/3e52b5df7c9b9e7f5ced5c79b9f59b66c905c3bd

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/3e52b5df7c9b9e7f5ced5c79b9f59b66c905c3bd
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240201/9f080502/attachment-0001.html>


More information about the ghc-commits mailing list