[Git][ghc/ghc][wip/ghc-9.8] 7 commits: Define FFI_GO_CLOSURES

Mon Jul 10 12:25:29 UTC 2023


Ben Gamari pushed to branch wip/ghc-9.8 at Glasgow Haskell Compiler / GHC


Commits:
6d5c0112 by Ben Gamari at 2023-07-10T08:23:45-04:00
Define FFI_GO_CLOSURES

The libffi shipped with Apple's XCode toolchain does not contain a
definition of the FFI_GO_CLOSURES macro, despite containing references
to said macro. Work around this by defining the macro, following the
model of a similar workaround in OpenJDK [1].

[1] https://github.com/openjdk/jdk17u-dev/pull/741/files

(cherry picked from commit 8b35e8caafeeccbf06b7faa70e807028a3f0ff43)

- - - - -
0c763c3b by Ben Gamari at 2023-07-10T08:23:54-04:00
base: Fix incorrect CPP guard

This was guarded on `darwin_HOST_OS` instead of `defined(darwin_HOST_OS)`.

(cherry picked from commit d7ef1704aeba451bd3e0efbdaaab2638ee1f0bc8)

- - - - -
b92f2660 by Ben Gamari at 2023-07-10T08:24:02-04:00
rts/Trace: Ensure that debugTrace arguments are used

As debugTrace is a macro we must take care to ensure that
the fact is clear to the compiler lest we see warnings.

(cherry picked from commit 7c7d1f66d35f73a2faa898a33aa80cd276159dc2)

- - - - -
1b2af02d by Ben Gamari at 2023-07-10T08:24:07-04:00
rts: Various warnings fixes

(cherry picked from commit cb92051e3d85575ff6abd753c9b135930cc50cf8)

- - - - -
e90c114d by Ben Gamari at 2023-07-10T08:24:17-04:00
hadrian: Ignore warnings in unix and semaphore-compat

(cherry picked from commit dec81dd1fd0475dde4929baae625d155387300bb)

- - - - -
63c05986 by Matthew Pickering at 2023-07-10T08:24:21-04:00
hadrian: Fix dependencies of docs:* rule

For the docs:* rule we need to actually build the package rather than
just the haddocks for the dependent packages. Therefore we depend on the
.conf files of the packages we are trying to build documentation for as
well as the .haddock files.

Fixes #23472

(cherry picked from commit d7f6448aa06bbf26173a06ee5c624f5b734786c5)

- - - - -
80211b50 by Ben Gamari at 2023-07-10T08:24:31-04:00
rts: Ensure that pinned allocations respect block size

Previously, it was possible for pinned, aligned allocation requests to
allocate beyond the end of the pinned accumulator block. Specifically,
we failed to account for the padding needed to achieve the requested
alignment in the "large object" check. With large alignment requests,
this can result in the allocator using the capability's pinned object
accumulator block to service a request which is larger than
`PINNED_EMPTY_SIZE`.

To fix this we reorganize `allocatePinned` to consistently account for
the alignment padding in all large object checks. This is a bit subtle
as we must handle the case of a small allocation request filling the
accumulator block, as well as large requests.

Fixes #23400.

(cherry picked from commit fd8c57694a00f6359bd66365f1284388c869ac60)

- - - - -


17 changed files:

- compiler/GHC/Driver/CodeOutput.hs
- hadrian/src/Flavour.hs
- hadrian/src/Rules/Documentation.hs
- libraries/base/include/HsBase.h
- libraries/ghci/GHCi/FFI.hsc
- rts/Interpreter.c
- rts/Schedule.c
- rts/Sparks.c
- rts/Trace.h
- rts/TraverseHeap.c
- rts/adjustor/LibffiAdjustor.c
- + rts/include/rts/ghc_ffi.h
- rts/rts.cabal.in
- rts/sm/GC.c
- rts/sm/NonMoving.c
- rts/sm/NonMovingMark.c
- rts/sm/Storage.c


Changes:

=====================================
compiler/GHC/Driver/CodeOutput.hs
=====================================
@@ -295,7 +295,7 @@ outputForeignStubs logger tmpfs dflags unit_state mod location stubs
 
             -- wrapper code mentions the ffi_arg type, which comes from ffi.h
             ffi_includes
-              | platformMisc_libFFI $ platformMisc dflags = "#include <ffi.h>\n"
+              | platformMisc_libFFI $ platformMisc dflags = "#include \"rts/ghc_ffi.h\"\n"
               | otherwise = ""
 
         stub_h_file_exists


=====================================
hadrian/src/Flavour.hs
=====================================
@@ -128,9 +128,13 @@ werror =
         ? notStage0
         ? mconcat
           [ arg "-Werror"
-          , flag CrossCompiling
-              ? package unix
+            -- unix has many unused imports
+          , package unix
               ? mconcat [arg "-Wwarn=unused-imports", arg "-Wwarn=unused-top-binds"]
+            -- semaphore-compat relies on sem_getvalue as provided by unix, which is
+            -- not implemented on Darwin and therefore throws a deprecation warning
+          , package semaphoreCompat
+              ? mconcat [arg "-Wwarn=deprecations"]
           ]
     , builder Ghc
         ? package rts


=====================================
hadrian/src/Rules/Documentation.hs
=====================================
@@ -258,6 +258,15 @@ buildPackageDocumentation = do
         need [ takeDirectory file  -/- "haddock-prologue.txt"]
         haddocks <- haddockDependencies context
 
+        -- Build Haddock documentation
+        -- TODO: Pass the correct way from Rules via Context.
+        dynamicPrograms <- dynamicGhcPrograms =<< flavour
+        let haddockWay = if dynamicPrograms then dynamic else vanilla
+
+        -- Build the dependencies of the package we are going to build documentation for
+        dep_pkgs <- sequence [pkgConfFile (context { way = haddockWay, Context.package = p})
+                             | (p, _) <- haddocks]
+
         -- `ghc-prim` has a source file for 'GHC.Prim' which is generated just
         -- for Haddock. We need to 'union' (instead of '++') to avoid passing
         -- 'GHC.PrimopWrappers' (which unfortunately shows up in both
@@ -266,12 +275,8 @@ buildPackageDocumentation = do
         vanillaSrcs <- hsSources context
         let srcs = vanillaSrcs `union` generatedSrcs
 
-        need $ srcs ++ (map snd haddocks)
+        need $ srcs ++ (map snd haddocks) ++ dep_pkgs
 
-        -- Build Haddock documentation
-        -- TODO: Pass the correct way from Rules via Context.
-        dynamicPrograms <- dynamicGhcPrograms =<< flavour
-        let haddockWay = if dynamicPrograms then dynamic else vanilla
         statsFilesDir <- haddockStatsFilesDir
         createDirectory statsFilesDir
         build $ target (context {way = haddockWay}) (Haddock BuildPackage) srcs [file]


=====================================
libraries/base/include/HsBase.h
=====================================
@@ -540,7 +540,7 @@ INLINE int __hscore_open(char *file, int how, mode_t mode) {
 }
 #endif
 
-#if darwin_HOST_OS
+#if defined(darwin_HOST_OS)
 // You should not access _environ directly on Darwin in a bundle/shared library.
 // See #2458 and http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man7/environ.7.html
 #include <crt_externs.h>


=====================================
libraries/ghci/GHCi/FFI.hsc
=====================================
@@ -22,6 +22,14 @@
 -}
 
 #if !defined(javascript_HOST_ARCH)
+-- See Note [FFI_GO_CLOSURES workaround] in ghc_ffi.h
+-- We can't include ghc_ffi.h here as we must build with stage0
+#if defined(darwin_HOST_OS)
+#if !defined(FFI_GO_CLOSURES)
+#define FFI_GO_CLOSURES 0
+#endif
+#endif
+
 #include <ffi.h>
 #endif
 


=====================================
rts/Interpreter.c
=====================================
@@ -39,7 +39,7 @@
 #endif
 #endif
 
-#include "ffi.h"
+#include "rts/ghc_ffi.h"
 
 /* --------------------------------------------------------------------------
  * The bytecode interpreter


=====================================
rts/Schedule.c
=====================================
@@ -1160,9 +1160,11 @@ scheduleHandleHeapOverflow( Capability *cap, StgTSO *t )
             barf("allocation of %ld bytes too large (GHC should have complained at compile-time)", (long)cap->r.rHpAlloc);
         }
 
+#if defined(DEBUG)
         debugTrace(DEBUG_sched,
                    "--<< thread %ld (%s) stopped: requesting a large block (size %ld)\n",
                    (long)t->id, what_next_strs[t->what_next], blocks);
+#endif
 
         // don't do this if the nursery is (nearly) full, we'll GC first.
         if (cap->r.rCurrentNursery->link != NULL ||
@@ -1231,9 +1233,11 @@ scheduleHandleYield( Capability *cap, StgTSO *t, uint32_t prev_what_next )
     // Shortcut if we're just switching evaluators: just run the thread.  See
     // Note [avoiding threadPaused] in Interpreter.c.
     if (t->what_next != prev_what_next) {
+#if defined(DEBUG)
         debugTrace(DEBUG_sched,
                    "--<< thread %ld (%s) stopped to switch evaluators",
                    (long)t->id, what_next_strs[t->what_next]);
+#endif
         return true;
     }
 
@@ -1806,7 +1810,7 @@ scheduleDoGC (Capability **pcap, Task *task USED_IF_THREADS,
                 }
             }
         }
-        debugTrace(DEBUG_sched, "%d idle caps", n_idle_caps);
+        debugTrace(DEBUG_sched, "%d idle caps, %d failed grabs", n_idle_caps, n_failed_trygrab_idles);
 
         for (i=0; i < n_capabilities; i++) {
             NONATOMIC_ADD(&getCapability(i)->idle, 1);
@@ -2643,7 +2647,6 @@ void
 scheduleWaitThread (StgTSO* tso, /*[out]*/HaskellObj* ret, Capability **pcap)
 {
     Task *task;
-    DEBUG_ONLY( StgThreadID id );
     Capability *cap;
 
     cap = *pcap;
@@ -2662,8 +2665,9 @@ scheduleWaitThread (StgTSO* tso, /*[out]*/HaskellObj* ret, Capability **pcap)
 
     appendToRunQueue(cap,tso);
 
-    DEBUG_ONLY( id = tso->id );
-    debugTrace(DEBUG_sched, "new bound thread (%" FMT_StgThreadID ")", id);
+    DEBUG_ONLY(
+        debugTrace(DEBUG_sched, "new bound thread (%" FMT_StgThreadID ")", (StgThreadID) tso->id);
+    );
 
     // As the TSO is bound and on the run queue, schedule() will run the TSO.
     cap = schedule(cap,task);
@@ -2671,7 +2675,7 @@ scheduleWaitThread (StgTSO* tso, /*[out]*/HaskellObj* ret, Capability **pcap)
     ASSERT(task->incall->rstat != NoStatus);
     ASSERT_FULL_CAPABILITY_INVARIANTS(cap,task);
 
-    debugTrace(DEBUG_sched, "bound thread (%" FMT_StgThreadID ") finished", id);
+    debugTrace(DEBUG_sched, "bound thread (%" FMT_StgThreadID ") finished", (StgThreadID) tso->id);
     *pcap = cap;
 }
 
@@ -2793,9 +2797,6 @@ exitScheduler (bool wait_foreign USED_IF_THREADS)
 
     shutdownCapabilities(task, wait_foreign);
 
-    // debugBelch("n_failed_trygrab_idles = %d, n_idle_caps = %d\n",
-    //            n_failed_trygrab_idles, n_idle_caps);
-
     exitMyTask();
 }
 


=====================================
rts/Sparks.c
=====================================
@@ -119,11 +119,10 @@ pruneSparkQueue (bool nonmovingMarkFinished, Capability *cap)
 {
     SparkPool *pool;
     StgClosurePtr spark, tmp, *elements;
-    uint32_t n, pruned_sparks; // stats only
+    uint32_t pruned_sparks; // stats only
     StgInt botInd,oldBotInd,currInd; // indices in array (always < size)
     const StgInfoTable *info;
 
-    n = 0;
     pruned_sparks = 0;
 
     pool = cap->sparks;
@@ -217,7 +216,6 @@ pruneSparkQueue (bool nonmovingMarkFinished, Capability *cap)
               if (closure_SHOULD_SPARK(tmp)) {
                   elements[botInd] = tmp; // keep entry (new address)
                   botInd++;
-                  n++;
               } else {
                   pruned_sparks++; // discard spark
                   cap->spark_stats.fizzled++;
@@ -247,7 +245,6 @@ pruneSparkQueue (bool nonmovingMarkFinished, Capability *cap)
                   if (closure_SHOULD_SPARK(spark)) {
                       elements[botInd] = spark; // keep entry (new address)
                       botInd++;
-                      n++;
                   } else {
                       pruned_sparks++; // discard spark
                       cap->spark_stats.fizzled++;
@@ -265,7 +262,6 @@ pruneSparkQueue (bool nonmovingMarkFinished, Capability *cap)
                   // isAlive() also ignores static closures (see GCAux.c)
                   elements[botInd] = spark; // keep entry (new address)
                   botInd++;
-                  n++;
               } else {
                   pruned_sparks++; // discard spark
                   cap->spark_stats.fizzled++;


=====================================
rts/Trace.h
=====================================
@@ -235,26 +235,25 @@ void traceThreadLabel_(Capability *cap,
                        char       *label,
                        size_t      len);
 
+
+#if defined(DEBUG)
+#define DEBUG_RTS 1
+#else
+#define DEBUG_RTS 0
+#endif
+
 /*
  * Emit a debug message (only when DEBUG is defined)
  */
-#if defined(DEBUG)
 #define debugTrace(class, msg, ...)             \
-    if (RTS_UNLIKELY(class)) {                  \
+    if (DEBUG_RTS && RTS_UNLIKELY(class)) {     \
         trace_(msg, ##__VA_ARGS__);             \
     }
-#else
-#define debugTrace(class, str, ...) /* nothing */
-#endif
 
-#if defined(DEBUG)
-#define debugTraceCap(class, cap, msg, ...)      \
-    if (RTS_UNLIKELY(class)) {                  \
+#define debugTraceCap(class, cap, msg, ...)     \
+    if (DEBUG_RTS && RTS_UNLIKELY(class)) {     \
         traceCap_(cap, msg, ##__VA_ARGS__);     \
     }
-#else
-#define debugTraceCap(class, cap, str, ...) /* nothing */
-#endif
 
 /*
  * Emit a message/event describing the state of a thread


=====================================
rts/TraverseHeap.c
=====================================
@@ -48,7 +48,7 @@ static void debug(const char *s, ...)
     va_end(ap);
 }
 #else
-#define debug(...)
+static void debug(const char *s STG_UNUSED, ...) {}
 #endif
 
 // number of blocks allocated for one stack


=====================================
rts/adjustor/LibffiAdjustor.c
=====================================
@@ -11,7 +11,7 @@
 #include "Hash.h"
 #include "Adjustor.h"
 
-#include "ffi.h"
+#include "rts/ghc_ffi.h"
 #include <string.h>
 
 // Note that ffi_alloc_prep_closure is a non-standard libffi closure


=====================================
rts/include/rts/ghc_ffi.h
=====================================
@@ -0,0 +1,28 @@
+/*
+ * <ffi.h> wrapper working around #23586.
+ *
+ * (c) The University of Glasgow 2023
+ *
+ */
+
+#pragma once
+
+/*
+ * Note [FFI_GO_CLOSURES workaround]
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * Apple ships a broken libffi with Xcode which lacks a definition of
+ * FFI_GO_CLOSURES despite having references to said macro. Work around this
+ * for now to avoid -Wundef warnings.
+ *
+ * We choose the value zero here by following the model of OpenJDK.
+ * See https://github.com/openjdk/jdk17u-dev/pull/741/files.
+ *
+ * See #23568.
+ */
+#if defined(darwin_HOST_OS)
+#if !defined(FFI_GO_CLOSURES)
+#define FFI_GO_CLOSURES 0
+#endif
+#endif
+
+#include "ffi.h"


=====================================
rts/rts.cabal.in
=====================================
@@ -237,6 +237,7 @@ library
                         rts/EventLogConstants.h
                         rts/EventTypes.h
                         -- ^ generated
+                        rts/ghc_ffi.h
                         rts/Adjustor.h
                         rts/ExecPage.h
                         rts/BlockSignals.h


=====================================
rts/sm/GC.c
=====================================
@@ -691,6 +691,7 @@ GarbageCollect (struct GcConfig config,
         }
         copied +=  mut_list_size;
 
+#if defined(DEBUG)
         debugTrace(DEBUG_gc,
                    "mut_list_size: %lu (%d vars, %d arrays, %d MVARs, %d TVARs, %d TVAR_WATCH_QUEUEs, %d TREC_CHUNKs, %d TREC_HEADERs, %d others)",
                    (unsigned long)(mut_list_size * sizeof(W_)),
@@ -702,6 +703,7 @@ GarbageCollect (struct GcConfig config,
                    mutlist_scav_stats.n_TREC_CHUNK,
                    mutlist_scav_stats.n_TREC_HEADER,
                    mutlist_scav_stats.n_OTHERS);
+#endif
     }
 
     bdescr *next, *prev;


=====================================
rts/sm/NonMoving.c
=====================================
@@ -901,14 +901,12 @@ static void nonmovingMark_(MarkQueue *mark_queue, StgWeak **dead_weaks, StgTSO *
     // updated their snapshot pointers and move them to the sweep list.
     for (int alloca_idx = 0; alloca_idx < NONMOVING_ALLOCA_CNT; ++alloca_idx) {
         struct NonmovingSegment *filled = nonmovingHeap.allocators[alloca_idx].saved_filled;
-        uint32_t n_filled = 0;
         if (filled) {
             struct NonmovingSegment *seg = filled;
             while (true) {
                 // Set snapshot
                 nonmovingSegmentInfo(seg)->next_free_snap = seg->next_free;
                 SET_SEGMENT_STATE(seg, FILLED_SWEEPING);
-                n_filled++;
                 if (seg->link) {
                     seg = seg->link;
                 } else {
@@ -1161,24 +1159,20 @@ void assert_in_nonmoving_heap(StgPtr p)
         }
 
         // Search active segments
-        int seg_idx = 0;
         struct NonmovingSegment *seg = alloca->active;
         while (seg) {
             if (p >= (P_)seg && p < (((P_)seg) + NONMOVING_SEGMENT_SIZE_W)) {
                 return;
             }
-            seg_idx++;
             seg = seg->link;
         }
 
         // Search filled segments
-        seg_idx = 0;
         seg = alloca->filled;
         while (seg) {
             if (p >= (P_)seg && p < (((P_)seg) + NONMOVING_SEGMENT_SIZE_W)) {
                 return;
             }
-            seg_idx++;
             seg = seg->link;
         }
     }


=====================================
rts/sm/NonMovingMark.c
=====================================
@@ -268,7 +268,7 @@ void nonmovingMarkInit(void) {
 #endif
 }
 
-#if defined(THREADED_RTS) && defined(DEBUG)
+#if defined(THREADED_RTS)
 static uint32_t markQueueLength(MarkQueue *q);
 #endif
 static void init_mark_queue_(MarkQueue *queue);
@@ -985,7 +985,7 @@ void freeMarkQueue (MarkQueue *queue)
     freeChain_lock(queue->blocks);
 }
 
-#if defined(THREADED_RTS) && defined(DEBUG)
+#if defined(THREADED_RTS)
 static uint32_t
 markQueueLength (MarkQueue *q)
 {


=====================================
rts/sm/Storage.c
=====================================
@@ -53,7 +53,7 @@
 
 #include <string.h>
 
-#include "ffi.h"
+#include "rts/ghc_ffi.h"
 
 /*
  * All these globals require sm_mutex to access in THREADED_RTS mode.
@@ -1231,6 +1231,74 @@ allocateMightFail (Capability *cap, W_ n)
  */
 #define MEMSET_SLOP_W(p, val, len_w) memset(p, val, (len_w) * sizeof(W_))
 
+/**
+ * Finish the capability's current pinned object accumulator block
+ * (cap->pinned_object_block), if any, and start a new one.
+ */
+static bdescr *
+start_new_pinned_block(Capability *cap)
+{
+    bdescr *bd = cap->pinned_object_block;
+
+    // stash the old block on cap->pinned_object_blocks.  On the
+    // next GC cycle these objects will be moved to
+    // g0->large_objects.
+    if (bd != NULL) {
+        // add it to the allocation stats when the block is full
+        finishedNurseryBlock(cap, bd);
+        dbl_link_onto(bd, &cap->pinned_object_blocks);
+    }
+
+    // We need to find another block.  We could just allocate one,
+    // but that means taking a global lock and we really want to
+    // avoid that (benchmarks that allocate a lot of pinned
+    // objects scale really badly if we do this).
+    //
+    // See Note [Sources of Block Level Fragmentation]
+    // for a more complete history of this section.
+    bd = cap->pinned_object_empty;
+    if (bd == NULL) {
+        // The pinned block list is empty: allocate a fresh block (we can't fail
+        // here).
+        ACQUIRE_SM_LOCK;
+        bd = allocNursery(cap->node, NULL, PINNED_EMPTY_SIZE);
+        RELEASE_SM_LOCK;
+    }
+
+    // Bump up the nursery pointer to avoid the pathological situation
+    // where a program is *only* allocating pinned objects.
+    // T4018 fails without this safety.
+    // This has the effect of counting a full pinned block in the same way
+    // as a full nursery block, so GCs will be triggered at the same interval
+    // if you are only allocating pinned data compared to normal allocations
+    // via allocate().
+    bdescr *nbd = cap->r.rCurrentNursery->link;
+    if (nbd != NULL){
+      newNurseryBlock(nbd);
+      cap->r.rCurrentNursery->link = nbd->link;
+      if (nbd->link != NULL) {
+          nbd->link->u.back = cap->r.rCurrentNursery;
+        }
+      dbl_link_onto(nbd, &cap->r.rNursery->blocks);
+      // Important for accounting purposes
+      if (cap->r.rCurrentAlloc){
+        finishedNurseryBlock(cap, cap->r.rCurrentAlloc);
+      }
+      cap->r.rCurrentAlloc = nbd;
+    }
+
+    cap->pinned_object_empty = bd->link;
+    newNurseryBlock(bd);
+    if (bd->link != NULL) {
+      bd->link->u.back = cap->pinned_object_empty;
+    }
+    initBdescr(bd, g0, g0);
+
+    cap->pinned_object_block = bd;
+    bd->flags  = BF_PINNED | BF_LARGE | BF_EVACUATED;
+    return bd;
+}
+
 /* ---------------------------------------------------------------------------
    Allocate a fixed/pinned object.
 
@@ -1258,135 +1326,76 @@ allocateMightFail (Capability *cap, W_ n)
 StgPtr
 allocatePinned (Capability *cap, W_ n /*words*/, W_ alignment /*bytes*/, W_ align_off /*bytes*/)
 {
-    StgPtr p;
-    bdescr *bd;
-
     // Alignment and offset have to be a power of two
-    ASSERT(alignment && !(alignment & (alignment - 1)));
-    ASSERT(alignment >= sizeof(W_));
-
-    ASSERT(!(align_off & (align_off - 1)));
+    CHECK(alignment && !(alignment & (alignment - 1)));
+    CHECK(!(align_off & (align_off - 1)));
+    // We don't support sub-word alignments
+    CHECK(alignment >= sizeof(W_));
+
+    bdescr *bd = cap->pinned_object_block;
+    if (bd == NULL) {
+        bd = start_new_pinned_block(cap);
+    }
 
     const StgWord alignment_w = alignment / sizeof(W_);
+    W_ off_w = ALIGN_WITH_OFF_W(bd->free, alignment, align_off);
+
+    // If the request is is smaller than LARGE_OBJECT_THRESHOLD then
+    // allocate into the pinned object accumulator.
+    if (n + off_w < LARGE_OBJECT_THRESHOLD/sizeof(W_)) {
+        // If the current pinned object block isn't large enough to hold the new
+        // object, get a new one.
+        if ((bd->free + off_w + n) > (bd->start + BLOCK_SIZE_W)) {
+            bd = start_new_pinned_block(cap);
+
+            // The pinned_object_block remains attached to the capability
+            // until it is full, even if a GC occurs.  We want this
+            // behaviour because otherwise the unallocated portion of the
+            // block would be forever slop, and under certain workloads
+            // (allocating a few ByteStrings per GC) we accumulate a lot
+            // of slop.
+            //
+            // So, the pinned_object_block is initially marked
+            // BF_EVACUATED so the GC won't touch it.  When it is full,
+            // we place it on the large_objects list, and at the start of
+            // the next GC the BF_EVACUATED flag will be cleared, and the
+            // block will be promoted as usual (if anything in it is
+            // live).
+
+            off_w = ALIGN_WITH_OFF_W(bd->free, alignment, align_off);
+        }
 
-    // If the request is for a large object, then allocate()
-    // will give us a pinned object anyway.
-    if (n >= LARGE_OBJECT_THRESHOLD/sizeof(W_)) {
-        // For large objects we don't bother optimizing the number of words
-        // allocated for alignment reasons. Here we just allocate the maximum
-        // number of extra words we could possibly need to satisfy the alignment
-        // constraint.
-        p = allocateMightFail(cap, n + alignment_w - 1);
-        if (p == NULL) {
-            return NULL;
-        } else {
-            Bdescr(p)->flags |= BF_PINNED;
-            W_ off_w = ALIGN_WITH_OFF_W(p, alignment, align_off);
+        // N.B. it is important that we account for the alignment padding
+        // when determining large-object-ness, lest we may over-fill the
+        // block. See #23400.
+        if (n + off_w < LARGE_OBJECT_THRESHOLD/sizeof(W_)) {
+            StgPtr p = bd->free;
             MEMSET_SLOP_W(p, 0, off_w);
+            n += off_w;
             p += off_w;
-            MEMSET_SLOP_W(p + n, 0, alignment_w - off_w - 1);
+            bd->free += n;
+            ASSERT(bd->free <= bd->start + bd->blocks * BLOCK_SIZE_W);
+            accountAllocation(cap, n);
             return p;
         }
     }
 
-    bd = cap->pinned_object_block;
-
-    W_ off_w = 0;
-
-    if(bd)
-        off_w = ALIGN_WITH_OFF_W(bd->free, alignment, align_off);
-
-    // If we don't have a block of pinned objects yet, or the current
-    // one isn't large enough to hold the new object, get a new one.
-    if (bd == NULL || (bd->free + off_w + n) > (bd->start + BLOCK_SIZE_W)) {
-
-        // stash the old block on cap->pinned_object_blocks.  On the
-        // next GC cycle these objects will be moved to
-        // g0->large_objects.
-        if (bd != NULL) {
-            // add it to the allocation stats when the block is full
-            finishedNurseryBlock(cap, bd);
-            dbl_link_onto(bd, &cap->pinned_object_blocks);
-        }
-
-        // We need to find another block.  We could just allocate one,
-        // but that means taking a global lock and we really want to
-        // avoid that (benchmarks that allocate a lot of pinned
-        // objects scale really badly if we do this).
-        //
-        // See Note [Sources of Block Level Fragmentation]
-        // for a more complete history of this section.
-        bd = cap->pinned_object_empty;
-        if (bd == NULL) {
-            // The pinned block list is empty: allocate a fresh block (we can't fail
-            // here).
-            ACQUIRE_SM_LOCK;
-            bd = allocNursery(cap->node, NULL, PINNED_EMPTY_SIZE);
-            RELEASE_SM_LOCK;
-        }
-
-        // Bump up the nursery pointer to avoid the pathological situation
-        // where a program is *only* allocating pinned objects.
-        // T4018 fails without this safety.
-        // This has the effect of counting a full pinned block in the same way
-        // as a full nursery block, so GCs will be triggered at the same interval
-        // if you are only allocating pinned data compared to normal allocations
-        // via allocate().
-        bdescr * nbd;
-        nbd = cap->r.rCurrentNursery->link;
-        if (nbd != NULL){
-          newNurseryBlock(nbd);
-          cap->r.rCurrentNursery->link = nbd->link;
-          if (nbd->link != NULL) {
-              nbd->link->u.back = cap->r.rCurrentNursery;
-            }
-          dbl_link_onto(nbd, &cap->r.rNursery->blocks);
-          // Important for accounting purposes
-          if (cap->r.rCurrentAlloc){
-            finishedNurseryBlock(cap, cap->r.rCurrentAlloc);
-          }
-          cap->r.rCurrentAlloc = nbd;
-        }
-
-
-        cap->pinned_object_empty = bd->link;
-        newNurseryBlock(bd);
-        if (bd->link != NULL) {
-          bd->link->u.back = cap->pinned_object_empty;
-        }
-        initBdescr(bd, g0, g0);
-
-        cap->pinned_object_block = bd;
-        bd->flags  = BF_PINNED | BF_LARGE | BF_EVACUATED;
-
-        // The pinned_object_block remains attached to the capability
-        // until it is full, even if a GC occurs.  We want this
-        // behaviour because otherwise the unallocated portion of the
-        // block would be forever slop, and under certain workloads
-        // (allocating a few ByteStrings per GC) we accumulate a lot
-        // of slop.
-        //
-        // So, the pinned_object_block is initially marked
-        // BF_EVACUATED so the GC won't touch it.  When it is full,
-        // we place it on the large_objects list, and at the start of
-        // the next GC the BF_EVACUATED flag will be cleared, and the
-        // block will be promoted as usual (if anything in it is
-        // live).
-
-        off_w = ALIGN_WITH_OFF_W(bd->free, alignment, align_off);
+    // Otherwise handle the request as a large object
+    // For large objects we don't bother optimizing the number of words
+    // allocated for alignment reasons. Here we just allocate the maximum
+    // number of extra words we could possibly need to satisfy the alignment
+    // constraint.
+    StgPtr p = allocateMightFail(cap, n + alignment_w - 1);
+    if (p == NULL) {
+        return NULL;
+    } else {
+        Bdescr(p)->flags |= BF_PINNED;
+        off_w = ALIGN_WITH_OFF_W(p, alignment, align_off);
+        MEMSET_SLOP_W(p, 0, off_w);
+        p += off_w;
+        MEMSET_SLOP_W(p + n, 0, alignment_w - off_w - 1);
+        return p;
     }
-
-    p = bd->free;
-
-    MEMSET_SLOP_W(p, 0, off_w);
-
-    n += off_w;
-    p += off_w;
-    bd->free += n;
-
-    accountAllocation(cap, n);
-
-    return p;
 }
 
 /* -----------------------------------------------------------------------------



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/0b61479aa3861b3b5337ef0eda5c197f5e817abd...80211b508b3b7e1973fbb1d8425acd23d6bd4d07

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/0b61479aa3861b3b5337ef0eda5c197f5e817abd...80211b508b3b7e1973fbb1d8425acd23d6bd4d07
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230710/943e727c/attachment-0001.html>