[Git][ghc/ghc][wip/win32-m32] 14 commits: Optimisations in Data.Foldable (T17867)

Ben Gamari gitlab at gitlab.haskell.org
Tue Dec 1 01:18:07 UTC 2020



Ben Gamari pushed to branch wip/win32-m32 at Glasgow Haskell Compiler / GHC


Commits:
49ebe369 by chessai at 2020-11-30T19:47:40-05:00
Optimisations in Data.Foldable (T17867)

This PR concerns the following functions from `Data.Foldable`:
* minimum
* maximum
* sum
* product
* minimumBy
* maximumBy

- Default implementations of these functions now use `foldl'` or `foldMap'`.
- All have been marked with INLINEABLE to make room for further optimisations.

- - - - -
4d79ef65 by chessai at 2020-11-30T19:47:40-05:00
Apply suggestion to libraries/base/Data/Foldable.hs
- - - - -
6af074ce by chessai at 2020-11-30T19:47:40-05:00
Apply suggestion to libraries/base/Data/Foldable.hs
- - - - -
ab334262 by Viktor Dukhovni at 2020-11-30T19:48:17-05:00
dirty MVAR after mutating TSO queue head

While the original head and tail of the TSO queue may be in the same
generation as the MVAR, interior elements of the queue could be younger
after a GC run and may then be exposed by putMVar operation that updates
the queue head.

Resolves #18919

- - - - -
5eb163f3 by Ben Gamari at 2020-11-30T19:48:53-05:00
rts/linker: Don't allow shared libraries to be loaded multiple times

- - - - -
490aa14d by Ben Gamari at 2020-11-30T19:48:53-05:00
rts/linker: Initialise CCSs from native shared objects

- - - - -
6ac3db5f by Ben Gamari at 2020-11-30T19:48:53-05:00
rts/linker: Move shared library loading logic into Elf.c

- - - - -
b6698d73 by GHC GitLab CI at 2020-11-30T19:48:53-05:00
rts/linker: Don't declare dynamic objects with image_mapped

This previously resulted in warnings due to spurious unmap failures.

- - - - -
b94a65af by jneira at 2020-11-30T19:49:31-05:00
Include tried paths in findToolDir error

- - - - -
5f343e52 by Ben Gamari at 2020-11-30T20:18:01-05:00
rts: Introduce mmapAnonForLinker

Previously most of the uses of mmapForLinker were mapping anonymous
memory, resulting in a great deal of unnecessary repetition. Factor this
out into a new helper.

Also fixes a few places where error checking was missing or suboptimal.

- - - - -
5cec6c99 by Ben Gamari at 2020-11-30T20:18:01-05:00
rts/linker: Introduce munmapForLinker

Consolidates munmap calls to ensure consistent error handling.

- - - - -
af713c50 by Ben Gamari at 2020-11-30T20:18:01-05:00
rts/Linker: Introduce Windows implementations for mmapForLinker, et al.

- - - - -
52d40ae1 by Ben Gamari at 2020-11-30T20:18:01-05:00
rts/m32: Introduce NEEDS_M32 macro

Instead of relying on RTS_LINKER_USE_MMAP

- - - - -
ac2c9225 by Ben Gamari at 2020-11-30T20:18:01-05:00
rts/linker: Use m32 to allocate symbol extras in PEi386

- - - - -


17 changed files:

- compiler/GHC/SysTools/BaseDir.hs
- libraries/base/Data/Foldable.hs
- rts/Linker.c
- rts/LinkerInternals.h
- rts/PrimOps.cmm
- rts/Profiling.c
- rts/Threads.c
- rts/linker/Elf.c
- rts/linker/Elf.h
- rts/linker/LoadArchive.c
- rts/linker/M32Alloc.c
- rts/linker/M32Alloc.h
- rts/linker/MachO.c
- rts/linker/PEi386.c
- rts/linker/PEi386Types.h
- rts/linker/SymbolExtras.c
- rts/linker/elf_got.c


Changes:

=====================================
compiler/GHC/SysTools/BaseDir.hs
=====================================
@@ -185,17 +185,19 @@ findToolDir
   :: FilePath -- ^ topdir
   -> IO (Maybe FilePath)
 #if defined(mingw32_HOST_OS) && !defined(USE_INPLACE_MINGW_TOOLCHAIN)
-findToolDir top_dir = go 0 (top_dir </> "..")
+findToolDir top_dir = go 0 (top_dir </> "..") []
   where maxDepth = 3
-        go :: Int -> FilePath -> IO (Maybe FilePath)
-        go k path
+        go :: Int -> FilePath -> [FilePath] -> IO (Maybe FilePath)
+        go k path tried
           | k == maxDepth = throwGhcExceptionIO $
-              InstallationError "could not detect mingw toolchain"
+              InstallationError $ "could not detect mingw toolchain in the following paths: " ++ show tried
           | otherwise = do
-              oneLevel <- doesDirectoryExist (path </> "mingw")
+              let try = path </> "mingw"
+              let tried = tried ++ [try]
+              oneLevel <- doesDirectoryExist try
               if oneLevel
                 then return (Just path)
-                else go (k+1) (path </> "..")
+                else go (k+1) (path </> "..") tried
 #else
 findToolDir _ = return Nothing
 #endif


=====================================
libraries/base/Data/Foldable.hs
=====================================
@@ -507,7 +507,8 @@ class Foldable t where
     -- @since 4.8.0.0
     maximum :: forall a . Ord a => t a -> a
     maximum = fromMaybe (errorWithoutStackTrace "maximum: empty structure") .
-       getMax . foldMap (Max #. (Just :: a -> Maybe a))
+       getMax . foldMap' (Max #. (Just :: a -> Maybe a))
+    {-# INLINEABLE maximum #-}
 
     -- | The least element of a non-empty structure.
     --
@@ -529,7 +530,8 @@ class Foldable t where
     -- @since 4.8.0.0
     minimum :: forall a . Ord a => t a -> a
     minimum = fromMaybe (errorWithoutStackTrace "minimum: empty structure") .
-       getMin . foldMap (Min #. (Just :: a -> Maybe a))
+       getMin . foldMap' (Min #. (Just :: a -> Maybe a))
+    {-# INLINEABLE minimum #-}
 
     -- | The 'sum' function computes the sum of the numbers of a structure.
     --
@@ -554,7 +556,8 @@ class Foldable t where
     --
     -- @since 4.8.0.0
     sum :: Num a => t a -> a
-    sum = getSum #. foldMap Sum
+    sum = getSum #. foldMap' Sum
+    {-# INLINEABLE sum #-}
 
     -- | The 'product' function computes the product of the numbers of a
     -- structure.
@@ -580,7 +583,8 @@ class Foldable t where
     --
     -- @since 4.8.0.0
     product :: Num a => t a -> a
-    product = getProduct #. foldMap Product
+    product = getProduct #. foldMap' Product
+    {-# INLINEABLE product #-}
 
 -- instances for Prelude types
 
@@ -1111,10 +1115,15 @@ all p = getAll #. foldMap (All #. p)
 
 -- See Note [maximumBy/minimumBy space usage]
 maximumBy :: Foldable t => (a -> a -> Ordering) -> t a -> a
-maximumBy cmp = foldl1 max'
-  where max' x y = case cmp x y of
-                        GT -> x
-                        _  -> y
+maximumBy cmp = fromMaybe (errorWithoutStackTrace "maximumBy: empty structure")
+  . foldl' max' Nothing
+  where
+    max' mx y = Just $! case mx of
+      Nothing -> y
+      Just x -> case cmp x y of
+        GT -> x
+        _ -> y
+{-# INLINEABLE maximumBy #-}
 
 -- | The least element of a non-empty structure with respect to the
 -- given comparison function.
@@ -1128,10 +1137,15 @@ maximumBy cmp = foldl1 max'
 
 -- See Note [maximumBy/minimumBy space usage]
 minimumBy :: Foldable t => (a -> a -> Ordering) -> t a -> a
-minimumBy cmp = foldl1 min'
-  where min' x y = case cmp x y of
-                        GT -> y
-                        _  -> x
+minimumBy cmp = fromMaybe (errorWithoutStackTrace "minimumBy: empty structure")
+  . foldl' min' Nothing
+  where
+    min' mx y = Just $! case mx of
+      Nothing -> y
+      Just x -> case cmp x y of
+        GT -> y
+        _ -> x
+{-# INLINEABLE minimumBy #-}
 
 -- | 'notElem' is the negation of 'elem'.
 --
@@ -1268,12 +1282,6 @@ proportional to the size of the data structure. For the common case of lists,
 this could be particularly bad (see #10830).
 
 For the common case of lists, switching the implementations of maximumBy and
-minimumBy to foldl1 solves the issue, as GHC's strictness analysis can then
-make these functions only use O(1) stack space. It is perhaps not the optimal
-way to fix this problem, as there are other conceivable data structures
-(besides lists) which might benefit from specialized implementations for
-maximumBy and minimumBy (see
-https://gitlab.haskell.org/ghc/ghc/issues/10830#note_129843 for a further
-discussion). But using foldl1 is at least always better than using foldr1, so
-GHC has chosen to adopt that approach for now.
+minimumBy to foldl1 solves the issue, assuming GHC's strictness analysis can then
+make these functions only use O(1) stack space. As of base 4.16, we have switched to employing foldl' over foldl1, not relying on GHC's optimiser. See https://gitlab.haskell.org/ghc/ghc/-/issues/17867 for more context.
 -}


=====================================
rts/Linker.c
=====================================
@@ -45,6 +45,8 @@
 #include <sys/types.h>
 #endif
 
+#include <fcntl.h>
+#include <unistd.h>
 #include <inttypes.h>
 #include <stdlib.h>
 #include <string.h>
@@ -63,7 +65,6 @@
 #  include "linker/Elf.h"
 #  include <regex.h>    // regex is already used by dlopen() so this is OK
                         // to use here without requiring an additional lib
-#  include <link.h>
 #elif defined(OBJFORMAT_PEi386)
 #  include "linker/PEi386.h"
 #  include <windows.h>
@@ -170,8 +171,6 @@ Mutex linker_mutex;
 /* Generic wrapper function to try and Resolve and RunInit oc files */
 int ocTryLoad( ObjectCode* oc );
 
-static void freeNativeCode_ELF (ObjectCode *nc);
-
 /* Link objects into the lower 2Gb on x86_64 and AArch64.  GHC assumes the
  * small memory model on this architecture (see gcc docs,
  * -mcmodel=small).
@@ -398,7 +397,7 @@ static void *dl_prog_handle;
 static regex_t re_invalid;
 static regex_t re_realso;
 #if defined(THREADED_RTS)
-static Mutex dl_mutex; // mutex to protect dlopen/dlerror critical section
+Mutex dl_mutex; // mutex to protect dlopen/dlerror critical section
 #endif
 #endif
 
@@ -1024,7 +1023,38 @@ resolveSymbolAddr (pathchar* buffer, int size,
 #endif /* OBJFORMAT_PEi386 */
 }
 
-#if RTS_LINKER_USE_MMAP
+#if defined(mingw32_HOST_OS)
+
+//
+// Returns NULL on failure.
+//
+void *
+mmapAnonForLinker (size_t bytes)
+{
+  return VirtualAlloc(NULL, bytes, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
+}
+
+void
+munmapForLinker (void *addr, size_t bytes, const char *caller)
+{
+  if (VirtualFree(addr, 0, MEM_RELEASE) == 0) {
+    sysErrorBelch("munmapForLinker: %s: Failed to unmap %zd bytes at %p",
+                  caller, bytes, addr);
+  }
+}
+
+void
+mmapForLinkerMarkExecutable(void *start, size_t len)
+{
+  DWORD old;
+  if (VirtualProtect(start, len, PAGE_EXECUTE_READ, &old) == 0) {
+    sysErrorBelch("mmapForLinkerMarkExecutable: failed to protect %zd bytes at %p",
+                  len, start);
+    ASSERT(false);
+  }
+}
+
+#elif RTS_LINKER_USE_MMAP
 //
 // Returns NULL on failure.
 //
@@ -1083,7 +1113,7 @@ mmap_again:
                fixed = MAP_FIXED;
                goto mmap_again;
 #else
-               errorBelch("loadObj: failed to mmap() memory below 2Gb; "
+               errorBelch("mmapForLinker: failed to mmap() memory below 2Gb; "
                           "asked for %lu bytes at %p. "
                           "Try specifying an address with +RTS -xm<addr> -RTS",
                           size, map_addr);
@@ -1143,6 +1173,24 @@ mmap_again:
    return result;
 }
 
+/*
+ * Map read/write pages in low memory. Returns NULL on failure.
+ */
+void *
+mmapAnonForLinker (size_t bytes)
+{
+  return mmapForLinker (bytes, PROT_READ|PROT_WRITE, MAP_ANONYMOUS, -1, 0);
+}
+
+void munmapForLinker (void *addr, size_t bytes, const char *caller)
+{
+  int r = munmap(addr, bytes);
+  if (r == -1) {
+    // Should we abort here?
+    sysErrorBelch("munmap: %s", caller);
+  }
+}
+
 /* Note [Memory protection in the linker]
  * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  * For many years the linker would simply map all of its memory
@@ -1158,8 +1206,9 @@ mmap_again:
  * Note that the m32 allocator handles protection of its allocations. For this
  * reason the caller to m32_alloc() must tell the allocator whether the
  * allocation needs to be executable. The caller must then ensure that they
- * call m32_flush() after they are finished filling the region, which will
- * cause the allocator to change the protection bits to PROT_READ|PROT_EXEC.
+ * call m32_allocator_flush() after they are finished filling the region, which
+ * will cause the allocator to change the protection bits to
+ * PROT_READ|PROT_EXEC.
  *
  */
 
@@ -1228,7 +1277,7 @@ freePreloadObjectFile (ObjectCode *oc)
 #else
 
     if (RTS_LINKER_USE_MMAP && oc->imageMapped) {
-        munmap(oc->image, oc->fileSize);
+        munmapForLinker(oc->image, oc->fileSize, "freePreloadObjectFile");
     }
     else {
         stgFree(oc->image);
@@ -1276,13 +1325,15 @@ void freeObjectCode (ObjectCode *oc)
                 switch(oc->sections[i].alloc){
 #if RTS_LINKER_USE_MMAP
                 case SECTION_MMAP:
-                    munmap(oc->sections[i].mapped_start,
-                           oc->sections[i].mapped_size);
+                    munmapForLinker(
+                        oc->sections[i].mapped_start,
+                        oc->sections[i].mapped_size,
+                        "freeObjectCode");
                     break;
+#endif
                 case SECTION_M32:
                     // Freed by m32_allocator_free
                     break;
-#endif
                 case SECTION_MALLOC:
                     IF_DEBUG(zero_on_gc,
                         memset(oc->sections[i].start,
@@ -1325,7 +1376,7 @@ void freeObjectCode (ObjectCode *oc)
     ocDeinit_ELF(oc);
 #endif
 
-#if RTS_LINKER_USE_MMAP == 1
+#if defined(NEED_M32)
     m32_allocator_free(oc->rx_m32);
     m32_allocator_free(oc->rw_m32);
 #endif
@@ -1403,7 +1454,7 @@ mkOc( ObjectType type, pathchar *path, char *image, int imageSize,
    oc->mark              = object_code_mark_bit;
    oc->dependencies      = allocHashSet();
 
-#if RTS_LINKER_USE_MMAP
+#if defined(NEED_M32)
    oc->rw_m32 = m32_allocator_new(false);
    oc->rx_m32 = m32_allocator_new(true);
 #endif
@@ -1740,7 +1791,7 @@ int ocTryLoad (ObjectCode* oc) {
 
     // We have finished loading and relocating; flush the m32 allocators to
     // setup page protections.
-#if RTS_LINKER_USE_MMAP
+#if defined(NEED_M32)
     m32_allocator_flush(oc->rx_m32);
     m32_allocator_flush(oc->rw_m32);
 #endif
@@ -1869,7 +1920,7 @@ HsInt purgeObj (pathchar *path)
     return r;
 }
 
-static OStatus getObjectLoadStatus_ (pathchar *path)
+OStatus getObjectLoadStatus_ (pathchar *path)
 {
     for (ObjectCode *o = objects; o; o = o->next) {
        if (0 == pathcmp(o->fileName, path)) {
@@ -1959,126 +2010,6 @@ addSection (Section *s, SectionKind kind, SectionAlloc alloc,
                        size, kind ));
 }
 
-
-#  if defined(OBJFORMAT_ELF)
-static int loadNativeObjCb_(struct dl_phdr_info *info,
-    size_t _size GNUC3_ATTRIBUTE(__unused__), void *data) {
-  ObjectCode* nc = (ObjectCode*) data;
-
-  // This logic mimicks _dl_addr_inside_object from glibc
-  // For reference:
-  // int
-  // internal_function
-  // _dl_addr_inside_object (struct link_map *l, const ElfW(Addr) addr)
-  // {
-  //   int n = l->l_phnum;
-  //   const ElfW(Addr) reladdr = addr - l->l_addr;
-  //
-  //   while (--n >= 0)
-  //     if (l->l_phdr[n].p_type == PT_LOAD
-  //         && reladdr - l->l_phdr[n].p_vaddr >= 0
-  //         && reladdr - l->l_phdr[n].p_vaddr < l->l_phdr[n].p_memsz)
-  //       return 1;
-  //   return 0;
-  // }
-
-  if ((void*) info->dlpi_addr == nc->l_addr) {
-    int n = info->dlpi_phnum;
-    while (--n >= 0) {
-      if (info->dlpi_phdr[n].p_type == PT_LOAD) {
-        NativeCodeRange* ncr =
-          stgMallocBytes(sizeof(NativeCodeRange), "loadNativeObjCb_");
-        ncr->start = (void*) ((char*) nc->l_addr + info->dlpi_phdr[n].p_vaddr);
-        ncr->end = (void*) ((char*) ncr->start + info->dlpi_phdr[n].p_memsz);
-
-        ncr->next = nc->nc_ranges;
-        nc->nc_ranges = ncr;
-      }
-    }
-  }
-  return 0;
-}
-
-static void copyErrmsg(char** errmsg_dest, char* errmsg) {
-  if (errmsg == NULL) errmsg = "loadNativeObj_ELF: unknown error";
-  *errmsg_dest = stgMallocBytes(strlen(errmsg)+1, "loadNativeObj_ELF");
-  strcpy(*errmsg_dest, errmsg);
-}
-
-// need dl_mutex
-static void freeNativeCode_ELF (ObjectCode *nc) {
-  dlclose(nc->dlopen_handle);
-
-  NativeCodeRange *ncr = nc->nc_ranges;
-  while (ncr) {
-    NativeCodeRange* last_ncr = ncr;
-    ncr = ncr->next;
-    stgFree(last_ncr);
-  }
-}
-
-static void * loadNativeObj_ELF (pathchar *path, char **errmsg)
-{
-   ObjectCode* nc;
-   void *hdl, *retval;
-
-   IF_DEBUG(linker, debugBelch("loadNativeObj_ELF %" PATH_FMT "\n", path));
-
-   retval = NULL;
-   ACQUIRE_LOCK(&dl_mutex);
-
-   nc = mkOc(DYNAMIC_OBJECT, path, NULL, 0, true, NULL, 0);
-
-   foreignExportsLoadingObject(nc);
-   hdl = dlopen(path, RTLD_NOW|RTLD_LOCAL);
-   foreignExportsFinishedLoadingObject();
-   if (hdl == NULL) {
-     /* dlopen failed; save the message in errmsg */
-     copyErrmsg(errmsg, dlerror());
-     goto dlopen_fail;
-   }
-
-   struct link_map *map;
-   if (dlinfo(hdl, RTLD_DI_LINKMAP, &map) == -1) {
-     /* dlinfo failed; save the message in errmsg */
-     copyErrmsg(errmsg, dlerror());
-     goto dlinfo_fail;
-   }
-
-   nc->l_addr = (void*) map->l_addr;
-   nc->dlopen_handle = hdl;
-   hdl = NULL; // pass handle ownership to nc
-
-   dl_iterate_phdr(loadNativeObjCb_, nc);
-   if (!nc->nc_ranges) {
-     copyErrmsg(errmsg, "dl_iterate_phdr failed to find obj");
-     goto dl_iterate_phdr_fail;
-   }
-
-   insertOCSectionIndices(nc);
-
-   nc->next_loaded_object = loaded_objects;
-   loaded_objects = nc;
-
-   retval = nc->dlopen_handle;
-   goto success;
-
-dl_iterate_phdr_fail:
-   // already have dl_mutex
-   freeNativeCode_ELF(nc);
-dlinfo_fail:
-   if (hdl) dlclose(hdl);
-dlopen_fail:
-success:
-
-   RELEASE_LOCK(&dl_mutex);
-   IF_DEBUG(linker, debugBelch("loadNativeObj_ELF result=%p\n", retval));
-
-   return retval;
-}
-
-#  endif
-
 #define UNUSED(x) (void)(x)
 
 void * loadNativeObj (pathchar *path, char **errmsg)
@@ -2166,7 +2097,7 @@ void freeSegments (ObjectCode *oc)
                 continue;
             } else {
 #if RTS_LINKER_USE_MMAP
-                CHECKM(0 == munmap(s->start, s->size), "freeSegments: failed to unmap memory");
+                munmapForLinker(s->start, s->size, "freeSegments");
 #else
                 stgFree(s->start);
 #endif


=====================================
rts/LinkerInternals.h
=====================================
@@ -20,8 +20,34 @@ void printLoadedObjects(void);
 
 #include "BeginPrivate.h"
 
+/* Which object file format are we targeting? */
+#if defined(linux_HOST_OS) || defined(solaris2_HOST_OS) \
+|| defined(linux_android_HOST_OS) \
+|| defined(freebsd_HOST_OS) || defined(kfreebsdgnu_HOST_OS) \
+|| defined(dragonfly_HOST_OS) || defined(netbsd_HOST_OS) \
+|| defined(openbsd_HOST_OS) || defined(gnu_HOST_OS)
+#  define OBJFORMAT_ELF
+#elif defined(mingw32_HOST_OS)
+#  define OBJFORMAT_PEi386
+#elif defined(darwin_HOST_OS) || defined(ios_HOST_OS)
+#  define OBJFORMAT_MACHO
+#endif
+
 typedef void SymbolAddr;
 typedef char SymbolName;
+typedef struct _ObjectCode ObjectCode;
+typedef struct _Section    Section;
+
+#if defined(OBJFORMAT_ELF)
+#  include "linker/ElfTypes.h"
+#elif defined(OBJFORMAT_PEi386)
+#  include "linker/PEi386Types.h"
+#elif defined(OBJFORMAT_MACHO)
+#  include "linker/MachOTypes.h"
+#else
+#  error "Unknown OBJECT_FORMAT for HOST_OS"
+#endif
+
 
 /* Hold extended information about a symbol in case we need to resolve it at a
    late stage.  */
@@ -102,26 +128,24 @@ typedef enum {
  * and always refer to it with the 'struct' qualifier.
  */
 
-typedef
-   struct _Section {
-      void*    start;              /* actual start of section in memory */
-      StgWord  size;               /* actual size of section in memory */
-      SectionKind kind;
-      SectionAlloc alloc;
-
-      /*
-       * The following fields are relevant for SECTION_MMAP sections only
-       */
-      StgWord mapped_offset;      /* offset from the image of mapped_start */
-      void* mapped_start;         /* start of mmap() block */
-      StgWord mapped_size;        /* size of mmap() block */
-
-      /* A customizable type to augment the Section type.
-       * See Note [No typedefs for customizable types]
-       */
-      struct SectionFormatInfo* info;
-   }
-   Section;
+struct _Section {
+  void*    start;              /* actual start of section in memory */
+  StgWord  size;               /* actual size of section in memory */
+  SectionKind kind;
+  SectionAlloc alloc;
+
+  /*
+   * The following fields are relevant for SECTION_MMAP sections only
+   */
+  StgWord mapped_offset;      /* offset from the image of mapped_start */
+  void* mapped_start;         /* start of mmap() block */
+  StgWord mapped_size;        /* size of mmap() block */
+
+  /* A customizable type to augment the Section type.
+   * See Note [No typedefs for customizable types]
+   */
+  struct SectionFormatInfo* info;
+};
 
 typedef
    struct _ProddableBlock {
@@ -145,6 +169,14 @@ typedef struct _Segment {
 #define NEED_SYMBOL_EXTRAS 1
 #endif
 
+/*
+ * We use the m32 allocator for symbol extras on Windows and other mmap-using
+ * platforms.
+ */
+#if RTS_LINKER_USE_MMAP || defined(mingw32_HOST_ARCH)
+#define NEED_M32 1
+#endif
+
 /* Jump Islands are sniplets of machine code required for relative
  * address relocations on the PowerPC, x86_64 and ARM.
  */
@@ -175,7 +207,7 @@ typedef enum {
 /* Top-level structure for an object module.  One of these is allocated
  * for each object file in use.
  */
-typedef struct _ObjectCode {
+struct _ObjectCode {
     OStatus    status;
     pathchar  *fileName;
     int        fileSize;     /* also mapped image size when using mmap() */
@@ -276,7 +308,7 @@ typedef struct _ObjectCode {
        require extra information.*/
     StrHashTable *extraInfos;
 
-#if RTS_LINKER_USE_MMAP == 1
+#if defined(NEED_M32)
     /* The m32 allocators used for allocating small sections and symbol extras
      * during loading. We have two: one for (writeable) data and one for
      * (read-only/executable) code. */
@@ -295,7 +327,7 @@ typedef struct _ObjectCode {
 
     /* virtual memory ranges of loaded code */
     NativeCodeRange *nc_ranges;
-} ObjectCode;
+};
 
 #define OC_INFORMATIVE_FILENAME(OC)             \
     ( (OC)->archiveMemberName ?                 \
@@ -306,6 +338,10 @@ typedef struct _ObjectCode {
 
 #if defined(THREADED_RTS)
 extern Mutex linker_mutex;
+
+#if defined(OBJFORMAT_ELF) || defined(OBJFORMAT_MACHO)
+extern Mutex dl_mutex;
+#endif
 #endif
 
 /* Type of the initializer */
@@ -334,8 +370,10 @@ void exitLinker( void );
 void freeObjectCode (ObjectCode *oc);
 SymbolAddr* loadSymbol(SymbolName *lbl, RtsSymbolInfo *pinfo);
 
+void *mmapAnonForLinker (size_t bytes);
 void *mmapForLinker (size_t bytes, uint32_t prot, uint32_t flags, int fd, int offset);
 void mmapForLinkerMarkExecutable (void *start, size_t len);
+void munmapForLinker (void *addr, size_t bytes, const char *caller);
 
 void addProddableBlock ( ObjectCode* oc, void* start, int size );
 void checkProddableBlock (ObjectCode *oc, void *addr, size_t size );
@@ -388,6 +426,7 @@ resolveSymbolAddr (pathchar* buffer, int size,
 #endif
 
 HsInt isAlreadyLoaded( pathchar *path );
+OStatus getObjectLoadStatus_ (pathchar *path);
 HsInt loadOc( ObjectCode* oc );
 ObjectCode* mkOc( ObjectType type, pathchar *path, char *image, int imageSize,
                   bool mapped, pathchar *archiveMemberName,
@@ -403,24 +442,6 @@ void freeSegments(ObjectCode *oc);
 #define MAP_ANONYMOUS MAP_ANON
 #endif
 
-/* Which object file format are we targeting? */
-#if defined(linux_HOST_OS) || defined(solaris2_HOST_OS) \
-|| defined(linux_android_HOST_OS) \
-|| defined(freebsd_HOST_OS) || defined(kfreebsdgnu_HOST_OS) \
-|| defined(dragonfly_HOST_OS) || defined(netbsd_HOST_OS) \
-|| defined(openbsd_HOST_OS) || defined(gnu_HOST_OS)
-#  define OBJFORMAT_ELF
-#  include "linker/ElfTypes.h"
-#elif defined(mingw32_HOST_OS)
-#  define OBJFORMAT_PEi386
-#  include "linker/PEi386Types.h"
-#elif defined(darwin_HOST_OS) || defined(ios_HOST_OS)
-#  define OBJFORMAT_MACHO
-#  include "linker/MachOTypes.h"
-#else
-#error "Unknown OBJECT_FORMAT for HOST_OS"
-#endif
-
 /* In order to simplify control flow a bit, some references to mmap-related
    definitions are blocked off by a C-level if statement rather than a CPP-level
    #if statement. Since those are dead branches when !RTS_LINKER_USE_MMAP, we


=====================================
rts/PrimOps.cmm
=====================================
@@ -1827,9 +1827,16 @@ loop:
     // There are readMVar/takeMVar(s) waiting: wake up the first one
 
     tso = StgMVarTSOQueue_tso(q);
-    StgMVar_head(mvar) = StgMVarTSOQueue_link(q);
-    if (StgMVar_head(mvar) == stg_END_TSO_QUEUE_closure) {
+    q = StgMVarTSOQueue_link(q);
+    StgMVar_head(mvar) = q;
+    if (q == stg_END_TSO_QUEUE_closure) {
         StgMVar_tail(mvar) = stg_END_TSO_QUEUE_closure;
+    } else {
+        if (info == stg_MVAR_CLEAN_info) {
+            // Resolve #18919.
+            ccall dirty_MVAR(BaseReg "ptr", mvar "ptr",
+                             StgMVar_value(mvar) "ptr");
+        }
     }
 
     ASSERT(StgTSO_block_info(tso) == mvar);
@@ -1854,10 +1861,8 @@ loop:
 
     // If it was a readMVar, then we can still do work,
     // so loop back. (XXX: This could take a while)
-    if (why_blocked == BlockedOnMVarRead) {
-        q = StgMVarTSOQueue_link(q);
+    if (why_blocked == BlockedOnMVarRead)
         goto loop;
-    }
 
     ASSERT(why_blocked == BlockedOnMVar);
 
@@ -1912,9 +1917,16 @@ loop:
     // There are takeMVar(s) waiting: wake up the first one
 
     tso = StgMVarTSOQueue_tso(q);
-    StgMVar_head(mvar) = StgMVarTSOQueue_link(q);
-    if (StgMVar_head(mvar) == stg_END_TSO_QUEUE_closure) {
+    q = StgMVarTSOQueue_link(q);
+    StgMVar_head(mvar) = q;
+    if (q == stg_END_TSO_QUEUE_closure) {
         StgMVar_tail(mvar) = stg_END_TSO_QUEUE_closure;
+    } else {
+        if (info == stg_MVAR_CLEAN_info) {
+            // Resolve #18919.
+            ccall dirty_MVAR(BaseReg "ptr", mvar "ptr",
+                             StgMVar_value(mvar) "ptr");
+        }
     }
 
     ASSERT(StgTSO_block_info(tso) == mvar);
@@ -1939,10 +1951,8 @@ loop:
 
     // If it was a readMVar, then we can still do work,
     // so loop back. (XXX: This could take a while)
-    if (why_blocked == BlockedOnMVarRead) {
-        q = StgMVarTSOQueue_link(q);
+    if (why_blocked == BlockedOnMVarRead)
         goto loop;
-    }
 
     ASSERT(why_blocked == BlockedOnMVar);
 


=====================================
rts/Profiling.c
=====================================
@@ -54,7 +54,7 @@ FILE *prof_file;
 // List of all cost centres. Used for reporting.
 CostCentre      *CC_LIST  = NULL;
 // All cost centre stacks temporarily appear here, to be able to make CCS_MAIN a
-// parent of all cost centres stacks (done in initProfiling2()).
+// parent of all cost centres stacks (done in refreshProfilingCCSs()).
 static CostCentreStack *CCS_LIST = NULL;
 
 #if defined(THREADED_RTS)


=====================================
rts/Threads.c
=====================================
@@ -803,9 +803,14 @@ loop:
 
     // There are takeMVar(s) waiting: wake up the first one
     tso = q->tso;
-    mvar->head = q->link;
-    if (mvar->head == (StgMVarTSOQueue*)&stg_END_TSO_QUEUE_closure) {
+    mvar->head = q = q->link;
+    if (q == (StgMVarTSOQueue*)&stg_END_TSO_QUEUE_closure) {
         mvar->tail = (StgMVarTSOQueue*)&stg_END_TSO_QUEUE_closure;
+    } else {
+        if (info == &stg_MVAR_CLEAN_info) {
+            // Resolve #18919.
+            dirty_MVAR(&cap->r, (StgClosure*)mvar, mvar->value);
+        }
     }
 
     ASSERT(tso->block_info.closure == (StgClosure*)mvar);
@@ -829,10 +834,8 @@ loop:
 
     // If it was a readMVar, then we can still do work,
     // so loop back. (XXX: This could take a while)
-    if (why_blocked == BlockedOnMVarRead) {
-        q = ((StgMVarTSOQueue*)q)->link;
+    if (why_blocked == BlockedOnMVarRead)
         goto loop;
-    }
 
     ASSERT(why_blocked == BlockedOnMVar);
 


=====================================
rts/linker/Elf.c
=====================================
@@ -15,16 +15,22 @@
 
 #include "RtsUtils.h"
 #include "RtsSymbolInfo.h"
+#include "CheckUnload.h"
+#include "LinkerInternals.h"
 #include "linker/Elf.h"
 #include "linker/CacheFlush.h"
 #include "linker/M32Alloc.h"
 #include "linker/SymbolExtras.h"
+#include "ForeignExports.h"
+#include "Profiling.h"
 #include "sm/OSMem.h"
 #include "GetEnv.h"
 #include "linker/util.h"
 #include "linker/elf_util.h"
 
+#include <link.h>
 #include <stdlib.h>
+#include <unistd.h>
 #include <string.h>
 #if defined(HAVE_SYS_STAT_H)
 #include <sys/stat.h>
@@ -709,7 +715,11 @@ ocGetNames_ELF ( ObjectCode* oc )
                * address might be out of range for sections that are mmaped.
                */
               alloc = SECTION_MMAP;
-              start = mmapForLinker(size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS, -1, 0);
+              start = mmapAnonForLinker(size);
+              if (start == NULL) {
+                barf("failed to mmap memory for bss. "
+                     "errno = %d", errno);
+              }
               mapped_start = start;
               mapped_offset = 0;
               mapped_size = roundUpToPage(size);
@@ -751,9 +761,9 @@ ocGetNames_ELF ( ObjectCode* oc )
           unsigned nstubs = numberOfStubsForSection(oc, i);
           unsigned stub_space = STUB_SIZE * nstubs;
 
-          void * mem = mmapForLinker(size+stub_space, PROT_READ | PROT_WRITE, MAP_ANON, -1, 0);
+          void * mem = mmapAnonForLinker(size+stub_space);
 
-          if( mem == MAP_FAILED ) {
+          if( mem == NULL ) {
               barf("failed to mmap allocated memory to load section %d. "
                    "errno = %d", i, errno);
           }
@@ -860,11 +870,10 @@ ocGetNames_ELF ( ObjectCode* oc )
       }
       void * common_mem = NULL;
       if(common_size > 0) {
-          common_mem = mmapForLinker(common_size,
-                            PROT_READ | PROT_WRITE,
-                            MAP_ANON | MAP_PRIVATE,
-                            -1, 0);
-          CHECK(common_mem != NULL);
+          common_mem = mmapAnonForLinker(common_size);
+          if (common_mem == NULL) {
+            barf("ocGetNames_ELF: Failed to allocate memory for SHN_COMMONs");
+          }
       }
 
       //TODO: we ignore local symbols anyway right? So we can use the
@@ -1969,6 +1978,143 @@ int ocRunInit_ELF( ObjectCode *oc )
    return 1;
 }
 
+/*
+ * Shared object loading
+ */
+
+static int loadNativeObjCb_(struct dl_phdr_info *info,
+    size_t _size GNUC3_ATTRIBUTE(__unused__), void *data) {
+  ObjectCode* nc = (ObjectCode*) data;
+
+  // This logic mimicks _dl_addr_inside_object from glibc
+  // For reference:
+  // int
+  // internal_function
+  // _dl_addr_inside_object (struct link_map *l, const ElfW(Addr) addr)
+  // {
+  //   int n = l->l_phnum;
+  //   const ElfW(Addr) reladdr = addr - l->l_addr;
+  //
+  //   while (--n >= 0)
+  //     if (l->l_phdr[n].p_type == PT_LOAD
+  //         && reladdr - l->l_phdr[n].p_vaddr >= 0
+  //         && reladdr - l->l_phdr[n].p_vaddr < l->l_phdr[n].p_memsz)
+  //       return 1;
+  //   return 0;
+  // }
+
+  if ((void*) info->dlpi_addr == nc->l_addr) {
+    int n = info->dlpi_phnum;
+    while (--n >= 0) {
+      if (info->dlpi_phdr[n].p_type == PT_LOAD) {
+        NativeCodeRange* ncr =
+          stgMallocBytes(sizeof(NativeCodeRange), "loadNativeObjCb_");
+        ncr->start = (void*) ((char*) nc->l_addr + info->dlpi_phdr[n].p_vaddr);
+        ncr->end = (void*) ((char*) ncr->start + info->dlpi_phdr[n].p_memsz);
+
+        ncr->next = nc->nc_ranges;
+        nc->nc_ranges = ncr;
+      }
+    }
+  }
+  return 0;
+}
+
+static void copyErrmsg(char** errmsg_dest, char* errmsg) {
+  if (errmsg == NULL) errmsg = "loadNativeObj_ELF: unknown error";
+  *errmsg_dest = stgMallocBytes(strlen(errmsg)+1, "loadNativeObj_ELF");
+  strcpy(*errmsg_dest, errmsg);
+}
+
+// need dl_mutex
+void freeNativeCode_ELF (ObjectCode *nc) {
+  dlclose(nc->dlopen_handle);
+
+  NativeCodeRange *ncr = nc->nc_ranges;
+  while (ncr) {
+    NativeCodeRange* last_ncr = ncr;
+    ncr = ncr->next;
+    stgFree(last_ncr);
+  }
+}
+
+void * loadNativeObj_ELF (pathchar *path, char **errmsg)
+{
+   ObjectCode* nc;
+   void *hdl, *retval;
+
+   IF_DEBUG(linker, debugBelch("loadNativeObj_ELF %" PATH_FMT "\n", path));
+
+   retval = NULL;
+   ACQUIRE_LOCK(&dl_mutex);
+
+   /* Loading the same object multiple times will lead to chaos
+    * as we will have two ObjectCodes but one underlying dlopen
+    * handle. Fail if this happens.
+    */
+   if (getObjectLoadStatus_(path) != OBJECT_NOT_LOADED) {
+     copyErrmsg(errmsg, "loadNativeObj_ELF: Already loaded");
+     goto dlopen_fail;
+   }
+
+   nc = mkOc(DYNAMIC_OBJECT, path, NULL, 0, false, NULL, 0);
+
+   foreignExportsLoadingObject(nc);
+   hdl = dlopen(path, RTLD_NOW|RTLD_LOCAL);
+   foreignExportsFinishedLoadingObject();
+   if (hdl == NULL) {
+     /* dlopen failed; save the message in errmsg */
+     copyErrmsg(errmsg, dlerror());
+     goto dlopen_fail;
+   }
+
+   struct link_map *map;
+   if (dlinfo(hdl, RTLD_DI_LINKMAP, &map) == -1) {
+     /* dlinfo failed; save the message in errmsg */
+     copyErrmsg(errmsg, dlerror());
+     goto dlinfo_fail;
+   }
+
+   nc->l_addr = (void*) map->l_addr;
+   nc->dlopen_handle = hdl;
+   hdl = NULL; // pass handle ownership to nc
+
+   dl_iterate_phdr(loadNativeObjCb_, nc);
+   if (!nc->nc_ranges) {
+     copyErrmsg(errmsg, "dl_iterate_phdr failed to find obj");
+     goto dl_iterate_phdr_fail;
+   }
+
+   insertOCSectionIndices(nc);
+
+   nc->next_loaded_object = loaded_objects;
+   loaded_objects = nc;
+
+   retval = nc->dlopen_handle;
+
+#if defined(PROFILING)
+  // collect any new cost centres that were defined in the loaded object.
+  refreshProfilingCCSs();
+#endif
+
+   goto success;
+
+dl_iterate_phdr_fail:
+   // already have dl_mutex
+   freeNativeCode_ELF(nc);
+dlinfo_fail:
+   if (hdl) dlclose(hdl);
+dlopen_fail:
+success:
+
+   RELEASE_LOCK(&dl_mutex);
+
+   IF_DEBUG(linker, debugBelch("loadNativeObj_ELF result=%p\n", retval));
+
+   return retval;
+}
+
+
 /*
  * PowerPC & X86_64 ELF specifics
  */


=====================================
rts/linker/Elf.h
=====================================
@@ -14,5 +14,7 @@ int ocGetNames_ELF       ( ObjectCode* oc );
 int ocResolve_ELF        ( ObjectCode* oc );
 int ocRunInit_ELF        ( ObjectCode* oc );
 int ocAllocateExtras_ELF ( ObjectCode *oc );
+void freeNativeCode_ELF  ( ObjectCode *nc );
+void *loadNativeObj_ELF  ( pathchar *path, char **errmsg );
 
 #include "EndPrivate.h"


=====================================
rts/linker/LoadArchive.c
=====================================
@@ -489,7 +489,7 @@ static HsInt loadArchive_ (pathchar *path)
 
 #if defined(darwin_HOST_OS) || defined(ios_HOST_OS)
             if (RTS_LINKER_USE_MMAP)
-                image = mmapForLinker(memberSize, PROT_READ | PROT_WRITE, MAP_ANONYMOUS, -1, 0);
+                image = mmapAnonForLinker(memberSize);
             else {
                 /* See loadObj() */
                 misalignment = machoGetMisalignment(f);
@@ -549,7 +549,7 @@ while reading filename from `%" PATH_FMT "'", path);
             }
             DEBUG_LOG("Found GNU-variant file index\n");
 #if RTS_LINKER_USE_MMAP
-            gnuFileIndex = mmapForLinker(memberSize + 1, PROT_READ | PROT_WRITE, MAP_ANONYMOUS, -1, 0);
+            gnuFileIndex = mmapAnonForLinker(memberSize + 1);
 #else
             gnuFileIndex = stgMallocBytes(memberSize + 1, "loadArchive(image)");
 #endif
@@ -613,7 +613,7 @@ fail:
         stgFree(fileName);
     if (gnuFileIndex != NULL) {
 #if RTS_LINKER_USE_MMAP
-        munmap(gnuFileIndex, gnuFileIndexSize + 1);
+        munmapForLinker(gnuFileIndex, gnuFileIndexSize + 1, "loadArchive_");
 #else
         stgFree(gnuFileIndex);
 #endif


=====================================
rts/linker/M32Alloc.c
=====================================
@@ -24,25 +24,25 @@ Note [Compile Time Trickery]
 
 This file implements two versions of each of the `m32_*` functions. At the top
 of the file there is the real implementation (compiled in when
-`RTS_LINKER_USE_MMAP` is true) and a dummy implementation that exists only to
+`NEED_M32` is true) and a dummy implementation that exists only to
 satisfy the compiler and which should never be called. If any of these dummy
 implementations are called the program will abort.
 
 The rationale for this is to allow the calling code to be written without using
-the C pre-processor (CPP) `#if` hackery. The value of `RTS_LINKER_USE_MMAP` is
-known at compile time, code like:
+the C pre-processor (CPP) `#if` hackery. The value of `NEED_M32` is
+known at compile time, allowing code like:
 
-    if (RTS_LINKER_USE_MMAP)
+    if (NEED_M32)
         m32_allocator_init();
 
-will be compiled to call to `m32_allocator_init` if  `RTS_LINKER_USE_MMAP` is
-true and will be optimised away to nothing if `RTS_LINKER_USE_MMAP` is false.
-However, regardless of the value of `RTS_LINKER_USE_MMAP` the compiler will
+will be compiled to call to `m32_allocator_init` if  `NEED_M32` is
+true and will be optimised away to nothing if `NEED_M32` is false.
+However, regardless of the value of `NEED_M32` the compiler will
 still check the call for syntax and correct function parameter types.
 
 */
 
-#if RTS_LINKER_USE_MMAP == 1
+#if defined(NEED_M32)
 
 /*
 
@@ -216,25 +216,6 @@ struct m32_page_t *m32_free_page_pool = NULL;
 unsigned int m32_free_page_pool_size = 0;
 // TODO
 
-/**
- * Wrapper for `unmap` that handles error cases.
- * This is the real implementation. There is another dummy implementation below.
- * See the note titled "Compile Time Trickery" at the top of this file.
- */
-static void
-munmapForLinker (void * addr, size_t size)
-{
-   IF_DEBUG(linker,
-            debugBelch("m32_alloc: Unmapping %zu bytes at %p\n",
-                       size, addr));
-
-   int r = munmap(addr,size);
-   if (r == -1) {
-      // Should we abort here?
-      sysErrorBelch("munmap");
-   }
-}
-
 /**
  * Free a page or, if possible, place it in the free page pool.
  */
@@ -246,7 +227,7 @@ m32_release_page(struct m32_page_t *page)
     m32_free_page_pool = page;
     m32_free_page_pool_size ++;
   } else {
-    munmapForLinker((void *) page, getPageSize());
+    munmapForLinker((void *) page, getPageSize(), "m32_release_page");
   }
 }
 
@@ -263,8 +244,8 @@ m32_alloc_page(void)
      * pages.
      */
     const size_t pgsz = getPageSize();
-    char *chunk = mmapForLinker(pgsz * M32_MAP_PAGES, PROT_READ | PROT_WRITE, MAP_ANONYMOUS, -1, 0);
-    if (chunk > (char *) 0xffffffff) {
+    uint8_t *chunk = mmapAnonForLinker(pgsz * M32_MAP_PAGES);
+    if (chunk > (uint8_t *) 0xffffffff) {
       barf("m32_alloc_page: failed to get allocation in lower 32-bits");
     }
 
@@ -309,7 +290,7 @@ m32_allocator_unmap_list(struct m32_page_t *head)
 {
   while (head != NULL) {
     struct m32_page_t *next = m32_filled_page_get_next(head);
-    munmapForLinker((void *) head, head->filled_page.size);
+    munmapForLinker((void *) head, head->filled_page.size, "m32_allocator_unmap_list");
     head = next;
   }
 }
@@ -327,7 +308,7 @@ void m32_allocator_free(m32_allocator *alloc)
   const size_t pgsz = getPageSize();
   for (int i=0; i < M32_MAX_PAGES; i++) {
     if (alloc->pages[i]) {
-      munmapForLinker(alloc->pages[i], pgsz);
+      munmapForLinker(alloc->pages[i], pgsz, "m32_allocator_free");
     }
   }
 
@@ -407,7 +388,14 @@ m32_alloc(struct m32_allocator_t *alloc, size_t size, size_t alignment)
    if (m32_is_large_object(size,alignment)) {
       // large object
       size_t alsize = ROUND_UP(sizeof(struct m32_page_t), alignment);
-      struct m32_page_t *page = mmapForLinker(alsize+size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS,-1,0);
+      struct m32_page_t *page = mmapAnonForLinker(alsize+size);
+      if (page == NULL) {
+          sysErrorBelch("m32_alloc: Failed to map pages for %zd bytes", size);
+          return NULL;
+      } else if (page > (struct m32_page_t *) 0xffffffff) {
+          debugBelch("m32_alloc: warning: Allocation of %zd bytes resulted in pages above 4GB (%p)",
+                     size, page);
+      }
       page->filled_page.size = alsize + size;
       m32_allocator_push_filled_list(&alloc->unprotected_list, (struct m32_page_t *) page);
       return (char*) page + alsize;
@@ -460,7 +448,7 @@ m32_alloc(struct m32_allocator_t *alloc, size_t size, size_t alignment)
    return (char*)page + ROUND_UP(sizeof(struct m32_page_t),alignment);
 }
 
-#elif RTS_LINKER_USE_MMAP == 0
+#else
 
 // The following implementations of these functions should never be called. If
 // they are, there is a bug at the call site.
@@ -491,8 +479,4 @@ m32_alloc(m32_allocator *alloc STG_UNUSED,
     barf("%s: RTS_LINKER_USE_MMAP is %d", __func__, RTS_LINKER_USE_MMAP);
 }
 
-#else
-
-#error RTS_LINKER_USE_MMAP should be either `0` or `1`.
-
 #endif


=====================================
rts/linker/M32Alloc.h
=====================================
@@ -8,19 +8,17 @@
 
 #pragma once
 
-#if RTS_LINKER_USE_MMAP == 1
-#include <fcntl.h>
-#include <sys/mman.h>
-
-#if defined(HAVE_UNISTD_H)
-#include <unistd.h>
-#endif
-
+/*
+ * We use the m32 allocator for symbol extras on Windows and other mmap-using
+ * platforms.
+ */
+#if RTS_LINKER_USE_MMAP || defined(mingw32_HOST_OS)
+#define NEED_M32 1
 #endif
 
 #include "BeginPrivate.h"
 
-#if RTS_LINKER_USE_MMAP
+#if defined(NEED_M32)
 #define M32_NO_RETURN    /* Nothing */
 #else
 #define M32_NO_RETURN    GNUC3_ATTRIBUTE(__noreturn__)


=====================================
rts/linker/MachO.c
=====================================
@@ -507,11 +507,8 @@ makeGot(ObjectCode * oc) {
 
     if(got_slots > 0) {
         oc->info->got_size =  got_slots * sizeof(void*);
-        oc->info->got_start = mmapForLinker(oc->info->got_size,
-                                   PROT_READ | PROT_WRITE,
-                                   MAP_ANON | MAP_PRIVATE,
-                                   -1, 0);
-        if( oc->info->got_start == MAP_FAILED ) {
+        oc->info->got_start = mmapAnonForLinker(oc->info->got_size);
+        if( oc->info->got_start == NULL ) {
             barf("MAP_FAILED. errno=%d", errno );
             return EXIT_FAILURE;
         }
@@ -528,7 +525,7 @@ makeGot(ObjectCode * oc) {
 
 void
 freeGot(ObjectCode * oc) {
-    munmap(oc->info->got_start, oc->info->got_size);
+    munmapForLinker(oc->info->got_start, oc->info->got_size, "freeGot");
     oc->info->got_start = NULL;
     oc->info->got_size = 0;
 }
@@ -1113,7 +1110,7 @@ ocBuildSegments_MachO(ObjectCode *oc)
         return 1;
     }
 
-    mem = mmapForLinker(size_compound, PROT_READ | PROT_WRITE, MAP_ANON, -1, 0);
+    mem = mmapAnonForLinker(size_compound);
     if (NULL == mem) return 0;
 
     IF_DEBUG(linker, debugBelch("ocBuildSegments: allocating %d segments\n", n_activeSegments));


=====================================
rts/linker/PEi386.c
=====================================
@@ -1788,42 +1788,28 @@ ocGetNames_PEi386 ( ObjectCode* oc )
 bool
 ocAllocateExtras_PEi386 ( ObjectCode* oc )
 {
-   /* If the ObjectCode was unloaded we don't need a trampoline, it's likely
-      an import library so we're discarding it earlier.  */
-   if (!oc->info)
-     return false;
+    /* If the ObjectCode was unloaded we don't need a trampoline, it's likely
+       an import library so we're discarding it earlier.  */
+    if (!oc->info)
+      return false;
 
-   const int mask = default_alignment - 1;
-   size_t origin  = oc->info->trampoline;
-   oc->symbol_extras
-     = (SymbolExtra*)((uintptr_t)(oc->info->image + origin + mask) & ~mask);
-   oc->first_symbol_extra = 0;
-   COFF_HEADER_INFO *info = oc->info->ch_info;
-   oc->n_symbol_extras    = info->numberOfSymbols;
+    // These are allocated on-demand from m32 by makeSymbolExtra_PEi386
+    oc->first_symbol_extra = 0;
+    oc->n_symbol_extras    = 0;
+    oc->symbol_extras      = NULL;
 
-   return true;
+    return true;
 }
 
 static size_t
-makeSymbolExtra_PEi386( ObjectCode* oc, uint64_t index, size_t s, char* symbol )
+makeSymbolExtra_PEi386( ObjectCode* oc, uint64_t index STG_UNUSED, size_t s, char* symbol STG_UNUSED )
 {
-    unsigned int curr_thunk;
-    SymbolExtra *extra;
-    curr_thunk = oc->first_symbol_extra + index;
-    if (index >= oc->n_symbol_extras) {
-      IF_DEBUG(linker, debugBelch("makeSymbolExtra first:%d, num:%lu, member:%" PATH_FMT ", index:%llu\n", curr_thunk, oc->n_symbol_extras, oc->archiveMemberName, index));
-      barf("Can't allocate thunk for `%s' in `%" PATH_FMT "' with member `%" PATH_FMT "'", symbol, oc->fileName, oc->archiveMemberName);
-    }
-
-    extra = oc->symbol_extras + curr_thunk;
+    SymbolExtra *extra = m32_alloc(oc->rx_m32, sizeof(SymbolExtra), 8);
 
-    if (!extra->addr)
-    {
-        // jmp *-14(%rip)
-        static uint8_t jmp[] = { 0xFF, 0x25, 0xF2, 0xFF, 0xFF, 0xFF };
-        extra->addr = (uint64_t)s;
-        memcpy(extra->jumpIsland, jmp, 6);
-    }
+    // jmp *-14(%rip)
+    static uint8_t jmp[] = { 0xFF, 0x25, 0xF2, 0xFF, 0xFF, 0xFF };
+    extra->addr = (uint64_t)s;
+    memcpy(extra->jumpIsland, jmp, 6);
 
     return (size_t)extra->jumpIsland;
 }


=====================================
rts/linker/PEi386Types.h
=====================================
@@ -7,10 +7,6 @@
 #include <stdint.h>
 #include <stdio.h>
 
-/* Some forward declares.  */
-struct Section;
-
-
 struct SectionFormatInfo {
     char* name;
     size_t alignment;


=====================================
rts/linker/SymbolExtras.c
=====================================
@@ -81,11 +81,11 @@ int ocAllocateExtras(ObjectCode* oc, int count, int first, int bssSize)
       // symbol_extras is aligned to a page boundary so it can be mprotect'd.
       bssSize = roundUpToPage(bssSize);
       size_t allocated_size = n + bssSize + extras_size;
-      void *new = mmapForLinker(allocated_size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS, -1, 0);
+      void *new = mmapAnonForLinker(allocated_size);
       if (new) {
           memcpy(new, oc->image, oc->fileSize);
           if (oc->imageMapped) {
-              munmap(oc->image, n);
+              munmapForLinker(oc->image, n, "ocAllocateExtras");
           }
           oc->image = new;
           oc->imageMapped = true;


=====================================
rts/linker/elf_got.c
=====================================
@@ -48,11 +48,8 @@ makeGot(ObjectCode * oc) {
     }
     if(got_slots > 0) {
         oc->info->got_size = got_slots * sizeof(void *);
-         void * mem = mmapForLinker(oc->info->got_size,
-                           PROT_READ | PROT_WRITE,
-                           MAP_ANON | MAP_PRIVATE,
-                           -1, 0);
-        if (mem == MAP_FAILED) {
+        void * mem = mmapAnonForLinker(oc->info->got_size);
+        if (mem == NULL) {
             errorBelch("MAP_FAILED. errno=%d", errno);
             return EXIT_FAILURE;
         }
@@ -147,7 +144,7 @@ verifyGot(ObjectCode * oc) {
 
 void
 freeGot(ObjectCode * oc) {
-//    munmap(oc->info->got_start, oc->info->got_size);
+//    munmapForLinker(oc->info->got_start, oc->info->got_size, "freeGot);
     oc->info->got_start = 0x0;
     oc->info->got_size = 0;
 }



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/61c3b2c8586eb299f95ea053fcf08e0548ff2a55...ac2c9225af79962e11cef9a0bf613365acc6da29

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/61c3b2c8586eb299f95ea053fcf08e0548ff2a55...ac2c9225af79962e11cef9a0bf613365acc6da29
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20201130/7fec6f6f/attachment-0001.html>


More information about the ghc-commits mailing list