[Git][ghc/ghc][wip/backports-9.8-2] 2 commits: linker: Avoid linear search when looking up Haskell symbols via dlsym

Ben Gamari (@bgamari) gitlab at gitlab.haskell.org
Thu Oct 10 17:35:30 UTC 2024



Ben Gamari pushed to branch wip/backports-9.8-2 at Glasgow Haskell Compiler / GHC


Commits:
ef4903f2 by Alexis King at 2024-10-10T13:35:13-04:00
linker: Avoid linear search when looking up Haskell symbols via dlsym

See the primary Note [Looking up symbols in the relevant objects] for a
more in-depth explanation.

When dynamically loading a Haskell symbol (typical when running a splice or
GHCi expression), before this commit we would search for the symbol in
all dynamic libraries that were loaded. However, this could be very
inefficient when too many packages are loaded (which can happen if there are
many package dependencies) because the time to lookup the would be
linear in the number of packages loaded.

This commit drastically improves symbol loading performance by
introducing a mapping from units to the handles of corresponding loaded
dlls. These handles are returned by dlopen when we load a dll, and can
then be used to look up in a specific dynamic library.

Looking up a given Name is now much more precise because we can get
lookup its unit in the mapping and lookup the symbol solely in the
handles of the dynamic libraries loaded for that unit.

In one measurement, the wait time before the expression was executed
went from +-38 seconds down to +-2s.

This commit also includes Note [Symbols may not be found in pkgs_loaded],
explaining the fallback to the old behaviour in case no dll can be found
in the unit mapping for a given Name.

Fixes #23415

Co-authored-by: Rodrigo Mesquita (@alt-romes)
(cherry picked from commit e008a19a7f9e8f22aada0b4e1049744f49d39aad)

- - - - -
0e14a98d by Ben Gamari at 2024-10-10T13:35:13-04:00
hadrian: Update bootstrap plans

- - - - -


14 changed files:

- compiler/GHC/ByteCode/Linker.hs
- compiler/GHC/Driver/Plugins.hs
- compiler/GHC/Linker/Loader.hs
- compiler/GHC/Linker/MacOS.hs
- compiler/GHC/Linker/Types.hs
- compiler/GHC/Runtime/Interpreter.hs
- hadrian/bootstrap/generate_bootstrap_plans
- libraries/ghci/GHCi/Message.hs
- libraries/ghci/GHCi/ObjLink.hs
- libraries/ghci/GHCi/Run.hs
- rts/Linker.c
- rts/RtsSymbols.c
- rts/include/rts/Linker.h
- testsuite/tests/rts/linker/T2615.hs


Changes:

=====================================
compiler/GHC/ByteCode/Linker.hs
=====================================
@@ -25,6 +25,7 @@ import GHCi.ResolvedBCO
 import GHCi.BreakArray
 
 import GHC.Builtin.PrimOps
+import GHC.Builtin.PrimOps.Ids
 import GHC.Builtin.Names
 
 import GHC.Unit.Types
@@ -40,6 +41,8 @@ import GHC.Utils.Outputable
 
 import GHC.Types.Name
 import GHC.Types.Name.Env
+import qualified GHC.Types.Id as Id
+import GHC.Types.Unique.DFM
 
 import Language.Haskell.Syntax.Module.Name
 
@@ -54,32 +57,33 @@ import GHC.Exts
 
 linkBCO
   :: Interp
+  -> PkgsLoaded
   -> LinkerEnv
   -> NameEnv Int
   -> RemoteRef BreakArray
   -> UnlinkedBCO
   -> IO ResolvedBCO
-linkBCO interp le bco_ix breakarray
+linkBCO interp pkgs_loaded le bco_ix breakarray
            (UnlinkedBCO _ arity insns bitmap lits0 ptrs0) = do
   -- fromIntegral Word -> Word64 should be a no op if Word is Word64
   -- otherwise it will result in a cast to longlong on 32bit systems.
-  lits <- mapM (fmap fromIntegral . lookupLiteral interp le) (ssElts lits0)
-  ptrs <- mapM (resolvePtr interp le bco_ix breakarray) (ssElts ptrs0)
+  lits <- mapM (fmap fromIntegral . lookupLiteral interp pkgs_loaded le) (ssElts lits0)
+  ptrs <- mapM (resolvePtr interp pkgs_loaded le bco_ix breakarray) (ssElts ptrs0)
   return (ResolvedBCO isLittleEndian arity insns bitmap
               (listArray (0, fromIntegral (sizeSS lits0)-1) lits)
               (addListToSS emptySS ptrs))
 
-lookupLiteral :: Interp -> LinkerEnv -> BCONPtr -> IO Word
-lookupLiteral interp le ptr = case ptr of
+lookupLiteral :: Interp -> PkgsLoaded -> LinkerEnv -> BCONPtr -> IO Word
+lookupLiteral interp pkgs_loaded le ptr = case ptr of
   BCONPtrWord lit -> return lit
   BCONPtrLbl  sym -> do
     Ptr a# <- lookupStaticPtr interp sym
     return (W# (int2Word# (addr2Int# a#)))
   BCONPtrItbl nm -> do
-    Ptr a# <- lookupIE interp (itbl_env le) nm
+    Ptr a# <- lookupIE interp pkgs_loaded (itbl_env le) nm
     return (W# (int2Word# (addr2Int# a#)))
   BCONPtrAddr nm -> do
-    Ptr a# <- lookupAddr interp (addr_env le) nm
+    Ptr a# <- lookupAddr interp pkgs_loaded (addr_env le) nm
     return (W# (int2Word# (addr2Int# a#)))
   BCONPtrStr _ ->
     -- should be eliminated during assembleBCOs
@@ -93,19 +97,19 @@ lookupStaticPtr interp addr_of_label_string = do
     Nothing  -> linkFail "GHC.ByteCode.Linker: can't find label"
                   (unpackFS addr_of_label_string)
 
-lookupIE :: Interp -> ItblEnv -> Name -> IO (Ptr ())
-lookupIE interp ie con_nm =
+lookupIE :: Interp -> PkgsLoaded -> ItblEnv -> Name -> IO (Ptr ())
+lookupIE interp pkgs_loaded ie con_nm =
   case lookupNameEnv ie con_nm of
     Just (_, ItblPtr a) -> return (fromRemotePtr (castRemotePtr a))
     Nothing -> do -- try looking up in the object files.
        let sym_to_find1 = nameToCLabel con_nm "con_info"
-       m <- lookupSymbol interp sym_to_find1
+       m <- lookupHsSymbol interp pkgs_loaded con_nm "con_info"
        case m of
           Just addr -> return addr
           Nothing
              -> do -- perhaps a nullary constructor?
                    let sym_to_find2 = nameToCLabel con_nm "static_info"
-                   n <- lookupSymbol interp sym_to_find2
+                   n <- lookupHsSymbol interp pkgs_loaded con_nm "static_info"
                    case n of
                       Just addr -> return addr
                       Nothing   -> linkFail "GHC.ByteCode.Linker.lookupIE"
@@ -113,35 +117,36 @@ lookupIE interp ie con_nm =
                                        unpackFS sym_to_find2)
 
 -- see Note [Generating code for top-level string literal bindings] in GHC.StgToByteCode
-lookupAddr :: Interp -> AddrEnv -> Name -> IO (Ptr ())
-lookupAddr interp ae addr_nm = do
+lookupAddr :: Interp -> PkgsLoaded -> AddrEnv -> Name -> IO (Ptr ())
+lookupAddr interp pkgs_loaded ae addr_nm = do
   case lookupNameEnv ae addr_nm of
     Just (_, AddrPtr ptr) -> return (fromRemotePtr ptr)
     Nothing -> do -- try looking up in the object files.
       let sym_to_find = nameToCLabel addr_nm "bytes"
                           -- see Note [Bytes label] in GHC.Cmm.CLabel
-      m <- lookupSymbol interp sym_to_find
+      m <- lookupHsSymbol interp pkgs_loaded addr_nm "bytes"
       case m of
         Just ptr -> return ptr
         Nothing -> linkFail "GHC.ByteCode.Linker.lookupAddr"
                      (unpackFS sym_to_find)
 
-lookupPrimOp :: Interp -> PrimOp -> IO (RemotePtr ())
-lookupPrimOp interp primop = do
+lookupPrimOp :: Interp -> PkgsLoaded -> PrimOp -> IO (RemotePtr ())
+lookupPrimOp interp pkgs_loaded primop = do
   let sym_to_find = primopToCLabel primop "closure"
-  m <- lookupSymbol interp (mkFastString sym_to_find)
+  m <- lookupHsSymbol interp pkgs_loaded (Id.idName $ primOpId primop) "closure"
   case m of
     Just p -> return (toRemotePtr p)
     Nothing -> linkFail "GHC.ByteCode.Linker.lookupCE(primop)" sym_to_find
 
 resolvePtr
   :: Interp
+  -> PkgsLoaded
   -> LinkerEnv
   -> NameEnv Int
   -> RemoteRef BreakArray
   -> BCOPtr
   -> IO ResolvedBCOPtr
-resolvePtr interp le bco_ix breakarray ptr = case ptr of
+resolvePtr interp pkgs_loaded le bco_ix breakarray ptr = case ptr of
   BCOPtrName nm
     | Just ix <- lookupNameEnv bco_ix nm
     -> return (ResolvedBCORef ix) -- ref to another BCO in this group
@@ -153,20 +158,42 @@ resolvePtr interp le bco_ix breakarray ptr = case ptr of
     -> assertPpr (isExternalName nm) (ppr nm) $
        do
           let sym_to_find = nameToCLabel nm "closure"
-          m <- lookupSymbol interp sym_to_find
+          m <- lookupHsSymbol interp pkgs_loaded nm "closure"
           case m of
             Just p -> return (ResolvedBCOStaticPtr (toRemotePtr p))
             Nothing -> linkFail "GHC.ByteCode.Linker.lookupCE" (unpackFS sym_to_find)
 
   BCOPtrPrimOp op
-    -> ResolvedBCOStaticPtr <$> lookupPrimOp interp op
+    -> ResolvedBCOStaticPtr <$> lookupPrimOp interp pkgs_loaded op
 
   BCOPtrBCO bco
-    -> ResolvedBCOPtrBCO <$> linkBCO interp le bco_ix breakarray bco
+    -> ResolvedBCOPtrBCO <$> linkBCO interp pkgs_loaded le bco_ix breakarray bco
 
   BCOPtrBreakArray
     -> return (ResolvedBCOPtrBreakArray breakarray)
 
+-- | Look up the address of a Haskell symbol in the currently
+-- loaded units.
+--
+-- See Note [Looking up symbols in the relevant objects].
+lookupHsSymbol :: Interp -> PkgsLoaded -> Name -> String -> IO (Maybe (Ptr ()))
+lookupHsSymbol interp pkgs_loaded nm sym_suffix = do
+  massertPpr (isExternalName nm) (ppr nm)
+  let sym_to_find = nameToCLabel nm sym_suffix
+      pkg_id = moduleUnitId $ nameModule nm
+      loaded_dlls = maybe [] loaded_pkg_hs_dlls $ lookupUDFM pkgs_loaded pkg_id
+
+      go (dll:dlls) = do
+        mb_ptr <- lookupSymbolInDLL interp dll sym_to_find
+        case mb_ptr of
+          Just ptr -> pure (Just ptr)
+          Nothing -> go dlls
+      go [] =
+        -- See Note [Symbols may not be found in pkgs_loaded] in GHC.Linker.Types
+        lookupSymbol interp sym_to_find
+
+  go loaded_dlls
+
 linkFail :: String -> String -> IO a
 linkFail who what
    = throwGhcExceptionIO (ProgramError $


=====================================
compiler/GHC/Driver/Plugins.hs
=====================================
@@ -405,12 +405,12 @@ loadExternalPluginLib :: FilePath -> IO ()
 loadExternalPluginLib path = do
   -- load library
   loadDLL path >>= \case
-    Just errmsg -> pprPanic "loadExternalPluginLib"
-                    (vcat [ text "Can't load plugin library"
-                          , text "  Library path: " <> text path
-                          , text "  Error       : " <> text errmsg
-                          ])
-    Nothing -> do
+    Left errmsg -> pprPanic "loadExternalPluginLib"
+                     (vcat [ text "Can't load plugin library"
+                           , text "  Library path: " <> text path
+                           , text "  Error       : " <> text errmsg
+                           ])
+    Right _ -> do
       -- resolve objects
       resolveObjs >>= \case
         True -> return ()


=====================================
compiler/GHC/Linker/Loader.hs
=====================================
@@ -56,6 +56,7 @@ import GHC.Tc.Utils.Monad
 import GHC.Runtime.Interpreter
 import GHCi.RemoteTypes
 import GHC.Iface.Load
+import GHCi.Message (LoadedDLL)
 
 import GHC.ByteCode.Linker
 import GHC.ByteCode.Asm
@@ -145,7 +146,7 @@ emptyLoaderState = LoaderState
   --
   -- The linker's symbol table is populated with RTS symbols using an
   -- explicit list.  See rts/Linker.c for details.
-  where init_pkgs = unitUDFM rtsUnitId (LoadedPkgInfo rtsUnitId [] [] emptyUniqDSet)
+  where init_pkgs = unitUDFM rtsUnitId (LoadedPkgInfo rtsUnitId [] [] [] emptyUniqDSet)
 
 extendLoadedEnv :: Interp -> [(Name,ForeignHValue)] -> IO ()
 extendLoadedEnv interp new_bindings =
@@ -194,8 +195,8 @@ loadDependencies
   -> SrcSpan
   -> [Module]
   -> IO (LoaderState, SuccessFlag, [Linkable], PkgsLoaded) -- ^ returns the set of linkables required
+-- When called, the loader state must have been initialized (see `initLoaderState`)
 loadDependencies interp hsc_env pls span needed_mods = do
---   initLoaderState (hsc_dflags hsc_env) dl
    let opts = initLinkDepsOpts hsc_env
 
    -- Find what packages and linkables are required
@@ -485,25 +486,25 @@ preloadLib interp hsc_env lib_paths framework_paths pls lib_spec = do
     DLL dll_unadorned -> do
       maybe_errstr <- loadDLL interp (platformSOName platform dll_unadorned)
       case maybe_errstr of
-         Nothing -> maybePutStrLn logger "done"
-         Just mm | platformOS platform /= OSDarwin ->
+         Right _ -> maybePutStrLn logger "done"
+         Left mm | platformOS platform /= OSDarwin ->
            preloadFailed mm lib_paths lib_spec
-         Just mm | otherwise -> do
+         Left mm | otherwise -> do
            -- As a backup, on Darwin, try to also load a .so file
            -- since (apparently) some things install that way - see
            -- ticket #8770.
            let libfile = ("lib" ++ dll_unadorned) <.> "so"
            err2 <- loadDLL interp libfile
            case err2 of
-             Nothing -> maybePutStrLn logger "done"
-             Just _  -> preloadFailed mm lib_paths lib_spec
+             Right _ -> maybePutStrLn logger "done"
+             Left _  -> preloadFailed mm lib_paths lib_spec
       return pls
 
     DLLPath dll_path -> do
       do maybe_errstr <- loadDLL interp dll_path
          case maybe_errstr of
-            Nothing -> maybePutStrLn logger "done"
-            Just mm -> preloadFailed mm lib_paths lib_spec
+            Right _ -> maybePutStrLn logger "done"
+            Left mm -> preloadFailed mm lib_paths lib_spec
          return pls
 
     Framework framework ->
@@ -588,7 +589,7 @@ loadExpr interp hsc_env span root_ul_bco = do
         let le = linker_env pls
             nobreakarray = error "no break array"
             bco_ix = mkNameEnv [(unlinkedBCOName root_ul_bco, 0)]
-        resolved <- linkBCO interp le bco_ix nobreakarray root_ul_bco
+        resolved <- linkBCO interp (pkgs_loaded pls) le bco_ix nobreakarray root_ul_bco
         [root_hvref] <- createBCOs interp [resolved]
         fhv <- mkFinalizedHValue interp root_hvref
         return (pls, fhv)
@@ -651,7 +652,7 @@ loadDecls interp hsc_env span cbc at CompiledByteCode{..} = do
                        , addr_env = plusNameEnv (addr_env le) bc_strs }
 
           -- Link the necessary packages and linkables
-          new_bindings <- linkSomeBCOs interp le2 [cbc]
+          new_bindings <- linkSomeBCOs interp (pkgs_loaded pls) le2 [cbc]
           nms_fhvs <- makeForeignNamedHValueRefs interp new_bindings
           let ce2  = extendClosureEnv (closure_env le2) nms_fhvs
               !pls2 = pls { linker_env = le2 { closure_env = ce2 } }
@@ -832,8 +833,8 @@ dynLoadObjs interp hsc_env pls at LoaderState{..} objs = do
     changeTempFilesLifetime tmpfs TFL_GhcSession [soFile]
     m <- loadDLL interp soFile
     case m of
-        Nothing -> return $! pls { temp_sos = (libPath, libName) : temp_sos }
-        Just err -> linkFail msg err
+      Right _ -> return $! pls { temp_sos = (libPath, libName) : temp_sos }
+      Left err -> linkFail msg err
   where
     msg = "GHC.Linker.Loader.dynLoadObjs: Loading temp shared object failed"
 
@@ -873,7 +874,7 @@ dynLinkBCOs interp pls bcos = do
             ae2 = foldr plusNameEnv (addr_env le1) (map bc_strs cbcs)
             le2 = le1 { itbl_env = ie2, addr_env = ae2 }
 
-        names_and_refs <- linkSomeBCOs interp le2 cbcs
+        names_and_refs <- linkSomeBCOs interp (pkgs_loaded pls) le2 cbcs
 
         -- We only want to add the external ones to the ClosureEnv
         let (to_add, to_drop) = partition (isExternalName.fst) names_and_refs
@@ -888,6 +889,7 @@ dynLinkBCOs interp pls bcos = do
 
 -- Link a bunch of BCOs and return references to their values
 linkSomeBCOs :: Interp
+             -> PkgsLoaded
              -> LinkerEnv
              -> [CompiledByteCode]
              -> IO [(Name,HValueRef)]
@@ -895,7 +897,7 @@ linkSomeBCOs :: Interp
                         -- the incoming unlinked BCOs.  Each gives the
                         -- value of the corresponding unlinked BCO
 
-linkSomeBCOs interp le mods = foldr fun do_link mods []
+linkSomeBCOs interp pkgs_loaded le mods = foldr fun do_link mods []
  where
   fun CompiledByteCode{..} inner accum =
     case bc_breaks of
@@ -908,7 +910,7 @@ linkSomeBCOs interp le mods = foldr fun do_link mods []
     let flat = [ (breakarray, bco) | (breakarray, bcos) <- mods, bco <- bcos ]
         names = map (unlinkedBCOName . snd) flat
         bco_ix = mkNameEnv (zip names [0..])
-    resolved <- sequence [ linkBCO interp le bco_ix breakarray bco
+    resolved <- sequence [ linkBCO interp pkgs_loaded le bco_ix breakarray bco
                          | (breakarray, bco) <- flat ]
     hvrefs <- createBCOs interp resolved
     return (zip names hvrefs)
@@ -1071,18 +1073,18 @@ loadPackages' interp hsc_env new_pks pls = do
                -- Link dependents first
              ; pkgs' <- link pkgs deps
                 -- Now link the package itself
-             ; (hs_cls, extra_cls) <- loadPackage interp hsc_env pkg_cfg
+             ; (hs_cls, extra_cls, loaded_dlls) <- loadPackage interp hsc_env pkg_cfg
              ; let trans_deps = unionManyUniqDSets [ addOneToUniqDSet (loaded_pkg_trans_deps loaded_pkg_info) dep_pkg
                                                    | dep_pkg <- deps
                                                    , Just loaded_pkg_info <- pure (lookupUDFM pkgs' dep_pkg)
                                                    ]
-             ; return (addToUDFM pkgs' new_pkg (LoadedPkgInfo new_pkg hs_cls extra_cls trans_deps)) }
+             ; return (addToUDFM pkgs' new_pkg (LoadedPkgInfo new_pkg hs_cls extra_cls loaded_dlls trans_deps)) }
 
         | otherwise
         = throwGhcExceptionIO (CmdLineError ("unknown package: " ++ unpackFS (unitIdFS new_pkg)))
 
 
-loadPackage :: Interp -> HscEnv -> UnitInfo -> IO ([LibrarySpec], [LibrarySpec])
+loadPackage :: Interp -> HscEnv -> UnitInfo -> IO ([LibrarySpec], [LibrarySpec], [RemotePtr LoadedDLL])
 loadPackage interp hsc_env pkg
    = do
         let dflags    = hsc_dflags hsc_env
@@ -1124,7 +1126,9 @@ loadPackage interp hsc_env pkg
         let classifieds = hs_classifieds ++ extra_classifieds
 
         -- Complication: all the .so's must be loaded before any of the .o's.
-        let known_dlls = [ dll  | DLLPath dll    <- classifieds ]
+        let known_hs_dlls    = [ dll | DLLPath dll <- hs_classifieds ]
+            known_extra_dlls = [ dll | DLLPath dll <- extra_classifieds ]
+            known_dlls       = known_hs_dlls ++ known_extra_dlls
 #if defined(CAN_LOAD_DLL)
             dlls       = [ dll  | DLL dll        <- classifieds ]
 #endif
@@ -1145,10 +1149,13 @@ loadPackage interp hsc_env pkg
         loadFrameworks interp platform pkg
         -- See Note [Crash early load_dyn and locateLib]
         -- Crash early if can't load any of `known_dlls`
-        mapM_ (load_dyn interp hsc_env True) known_dlls
+        mapM_ (load_dyn interp hsc_env True) known_extra_dlls
+        loaded_dlls <- mapMaybeM (load_dyn interp hsc_env True) known_hs_dlls
         -- For remaining `dlls` crash early only when there is surely
         -- no package's DLL around ... (not is_dyn)
         mapM_ (load_dyn interp hsc_env (not is_dyn) . platformSOName platform) dlls
+#else
+        let loaded_dlls = []
 #endif
         -- After loading all the DLLs, we can load the static objects.
         -- Ordering isn't important here, because we do one final link
@@ -1168,7 +1175,7 @@ loadPackage interp hsc_env pkg
         if succeeded ok
            then do
              maybePutStrLn logger "done."
-             return (hs_classifieds, extra_classifieds)
+             return (hs_classifieds, extra_classifieds, loaded_dlls)
            else let errmsg = text "unable to load unit `"
                              <> pprUnitInfoForUser pkg <> text "'"
                  in throwGhcExceptionIO (InstallationError (showSDoc dflags errmsg))
@@ -1221,19 +1228,20 @@ restriction very easily.
 -- can be passed directly to loadDLL.  They are either fully-qualified
 -- ("/usr/lib/libfoo.so"), or unqualified ("libfoo.so").  In the latter case,
 -- loadDLL is going to search the system paths to find the library.
-load_dyn :: Interp -> HscEnv -> Bool -> FilePath -> IO ()
+load_dyn :: Interp -> HscEnv -> Bool -> FilePath -> IO (Maybe (RemotePtr LoadedDLL))
 load_dyn interp hsc_env crash_early dll = do
   r <- loadDLL interp dll
   case r of
-    Nothing  -> return ()
-    Just err ->
+    Right loaded_dll -> pure (Just loaded_dll)
+    Left err ->
       if crash_early
         then cmdLineErrorIO err
-        else
+        else do
           when (diag_wopt Opt_WarnMissedExtraSharedLib diag_opts)
             $ logMsg logger
                 (mkMCDiagnostic diag_opts (WarningWithFlag Opt_WarnMissedExtraSharedLib) Nothing)
                   noSrcSpan $ withPprStyle defaultUserStyle (note err)
+          pure Nothing
   where
     diag_opts = initDiagOpts (hsc_dflags hsc_env)
     logger = hsc_logger hsc_env


=====================================
compiler/GHC/Linker/MacOS.hs
=====================================
@@ -172,6 +172,6 @@ loadFramework interp extraPaths rootname
      findLoadDLL (p:ps) errs =
        do { dll <- loadDLL interp (p </> fwk_file)
           ; case dll of
-              Nothing  -> return Nothing
-              Just err -> findLoadDLL ps ((p ++ ": " ++ err):errs)
+              Right _  -> return Nothing
+              Left err -> findLoadDLL ps ((p ++ ": " ++ err):errs)
           }


=====================================
compiler/GHC/Linker/Types.hs
=====================================
@@ -40,7 +40,8 @@ import GHC.Prelude
 import GHC.Unit                ( UnitId, Module )
 import GHC.ByteCode.Types      ( ItblEnv, AddrEnv, CompiledByteCode )
 import GHC.Fingerprint.Type    ( Fingerprint )
-import GHCi.RemoteTypes        ( ForeignHValue )
+import GHCi.RemoteTypes        ( ForeignHValue, RemotePtr )
+import GHCi.Message            ( LoadedDLL )
 
 import GHC.Types.Var           ( Id )
 import GHC.Types.Name.Env      ( NameEnv, emptyNameEnv, extendNameEnvList, filterNameEnv )
@@ -75,6 +76,53 @@ initialised.
 
 The LinkerEnv maps Names to actual closures (for interpreted code only), for
 use during linking.
+
+Note [Looking up symbols in the relevant objects]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In #23415, we determined that a lot of time (>10s, or even up to >35s!) was
+being spent on dynamically loading symbols before actually interpreting code
+when `:main` was run in GHCi. The root cause was that for each symbol we wanted
+to lookup, we would traverse the list of loaded objects and try find the symbol
+in each of them with dlsym (i.e. looking up a symbol was, worst case, linear in
+the amount of loaded objects).
+
+To drastically improve load time (from +-38 seconds down to +-2s), we now:
+
+1. For every of the native objects loaded for a given unit, store the handles returned by `dlopen`.
+  - In `pkgs_loaded` of the `LoaderState`, which maps `UnitId`s to
+    `LoadedPkgInfo`s, where the handles live in its field `loaded_pkg_hs_dlls`.
+
+2. When looking up a Name (e.g. `lookupHsSymbol`), find that name's `UnitId` in
+    the `pkgs_loaded` mapping,
+
+3. And only look for the symbol (with `dlsym`) on the /handles relevant to that
+    unit/, rather than in every loaded object.
+
+Note [Symbols may not be found in pkgs_loaded]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Currently the `pkgs_loaded` mapping only contains the dynamic objects
+associated with loaded units. Symbols defined in a static object (e.g. from a
+statically-linked Haskell library) are found via the generic `lookupSymbol`
+function call by `lookupHsSymbol` when the symbol is not found in any of the
+dynamic objects of `pkgs_loaded`.
+
+The rationale here is two-fold:
+
+ * we have only observed major link-time issues in dynamic linking; lookups in
+ the RTS linker's static symbol table seem to be fast enough
+
+ * allowing symbol lookups restricted to a single ObjectCode would require the
+ maintenance of a symbol table per `ObjectCode`, which would introduce time and
+ space overhead
+
+This fallback is further needed because we don't look in the haskell objects
+loaded for the home units (see the call to `loadModuleLinkables` in
+`loadDependencies`, as opposed to the call to `loadPackages'` in the same
+function which updates `pkgs_loaded`). We should ultimately keep track of the
+objects loaded (probably in `objs_loaded`, for which `LinkableSet` is a bit
+unsatisfactory, see a suggestion in 51c5c4eb1f2a33e4dc88e6a37b7b7c135234ce9b)
+and be able to lookup symbols specifically in them too (similarly to
+`lookupSymbolInDLL`).
 -}
 
 newtype Loader = Loader { loader_state :: MVar (Maybe LoaderState) }
@@ -146,11 +194,13 @@ data LoadedPkgInfo
   { loaded_pkg_uid         :: !UnitId
   , loaded_pkg_hs_objs     :: ![LibrarySpec]
   , loaded_pkg_non_hs_objs :: ![LibrarySpec]
+  , loaded_pkg_hs_dlls     :: ![RemotePtr LoadedDLL]
+    -- ^ See Note [Looking up symbols in the relevant objects]
   , loaded_pkg_trans_deps  :: UniqDSet UnitId
   }
 
 instance Outputable LoadedPkgInfo where
-  ppr (LoadedPkgInfo uid hs_objs non_hs_objs trans_deps) =
+  ppr (LoadedPkgInfo uid hs_objs non_hs_objs _ trans_deps) =
     vcat [ppr uid
          , ppr hs_objs
          , ppr non_hs_objs
@@ -159,10 +209,10 @@ instance Outputable LoadedPkgInfo where
 
 -- | Information we can use to dynamically link modules into the compiler
 data Linkable = LM {
-  linkableTime     :: !UTCTime,          -- ^ Time at which this linkable was built
+  linkableTime     :: !UTCTime,         -- ^ Time at which this linkable was built
                                         -- (i.e. when the bytecodes were produced,
                                         --       or the mod date on the files)
-  linkableModule   :: !Module,           -- ^ The linkable module itself
+  linkableModule   :: !Module,          -- ^ The linkable module itself
   linkableUnlinked :: [Unlinked]
     -- ^ Those files and chunks of code we have yet to link.
     --


=====================================
compiler/GHC/Runtime/Interpreter.hs
=====================================
@@ -37,6 +37,7 @@ module GHC.Runtime.Interpreter
   -- * The object-code linker
   , initObjLinker
   , lookupSymbol
+  , lookupSymbolInDLL
   , lookupClosure
   , loadDLL
   , loadArchive
@@ -478,6 +479,13 @@ lookupSymbol interp str = case interpInstance interp of
 
     ExtJS {} -> pprPanic "lookupSymbol not supported by the JS interpreter" (ppr str)
 
+lookupSymbolInDLL :: Interp -> RemotePtr LoadedDLL -> FastString -> IO (Maybe (Ptr ()))
+lookupSymbolInDLL interp _dll _str = case interpInstance interp of
+#if defined(HAVE_INTERNAL_INTERPRETER)
+  InternalInterp -> fmap fromRemotePtr <$> run (LookupSymbolInDLL _dll (unpackFS _str))
+#endif
+  ExternalInterp _ -> panic "lookupSymbolInDLL: not implemented for external interpreter" -- FIXME
+
 lookupClosure :: Interp -> String -> IO (Maybe HValueRef)
 lookupClosure interp str =
   interpCmd interp (LookupClosure str)
@@ -496,12 +504,7 @@ purgeLookupSymbolCache interp = case interpInstance interp of
 -- an absolute pathname to the file, or a relative filename
 -- (e.g. "libfoo.so" or "foo.dll").  In the latter case, loadDLL
 -- searches the standard locations for the appropriate library.
---
--- Returns:
---
--- Nothing      => success
--- Just err_msg => failure
-loadDLL :: Interp -> String -> IO (Maybe String)
+loadDLL :: Interp -> String -> IO (Either String (RemotePtr LoadedDLL))
 loadDLL interp str = interpCmd interp (LoadDLL str)
 
 loadArchive :: Interp -> String -> IO ()


=====================================
hadrian/bootstrap/generate_bootstrap_plans
=====================================
@@ -23,6 +23,11 @@ run_all() {
     run "9_4_4"
     run "9_6_1"
     run "9_6_2"
+    run "9_6_3"
+    run "9_6_4"
+    run "9_6_5"
+    run "9_8_1"
+    run "9_8_2"
 }
 
 if (( $# == 0 )); then


=====================================
libraries/ghci/GHCi/Message.hs
=====================================
@@ -21,6 +21,7 @@ module GHCi.Message
   , QState(..)
   , getMessage, putMessage, getTHMessage, putTHMessage
   , Pipe(..), remoteCall, remoteTHCall, readPipe, writePipe
+  , LoadedDLL
   ) where
 
 import Prelude -- See note [Why do we import Prelude here?]
@@ -69,8 +70,9 @@ data Message a where
   -- These all invoke the corresponding functions in the RTS Linker API.
   InitLinker :: Message ()
   LookupSymbol :: String -> Message (Maybe (RemotePtr ()))
+  LookupSymbolInDLL :: RemotePtr LoadedDLL -> String -> Message (Maybe (RemotePtr ()))
   LookupClosure :: String -> Message (Maybe HValueRef)
-  LoadDLL :: String -> Message (Maybe String)
+  LoadDLL :: String -> Message (Either String (RemotePtr LoadedDLL))
   LoadArchive :: String -> Message () -- error?
   LoadObj :: String -> Message () -- error?
   UnloadObj :: String -> Message () -- error?
@@ -394,6 +396,9 @@ data EvalResult a
 
 instance Binary a => Binary (EvalResult a)
 
+-- | A dummy type that tags pointers returned by 'LoadDLL'.
+data LoadedDLL
+
 -- SomeException can't be serialized because it contains dynamic
 -- types.  However, we do very limited things with the exceptions that
 -- are thrown by interpreted computations:
@@ -521,6 +526,7 @@ getMessage = do
       36 -> Msg <$> (Seq <$> get)
       37 -> Msg <$> return RtsRevertCAFs
       38 -> Msg <$> (ResumeSeq <$> get)
+      40 -> Msg <$> (LookupSymbolInDLL <$> get <*> get)
       _  -> error $ "Unknown Message code " ++ (show b)
 
 putMessage :: Message a -> Put
@@ -564,6 +570,7 @@ putMessage m = case m of
   Seq a                       -> putWord8 36 >> put a
   RtsRevertCAFs               -> putWord8 37
   ResumeSeq a                 -> putWord8 38 >> put a
+  LookupSymbolInDLL dll str   -> putWord8 40 >> put dll >> put str
 
 -- -----------------------------------------------------------------------------
 -- Reading/writing messages


=====================================
libraries/ghci/GHCi/ObjLink.hs
=====================================
@@ -18,6 +18,7 @@ module GHCi.ObjLink
   , unloadObj
   , purgeObj
   , lookupSymbol
+  , lookupSymbolInDLL
   , lookupClosure
   , resolveObjs
   , addLibrarySearchPath
@@ -27,18 +28,17 @@ module GHCi.ObjLink
 
 import Prelude -- See note [Why do we import Prelude here?]
 import GHCi.RemoteTypes
+import GHCi.Message (LoadedDLL)
 import Control.Exception (throwIO, ErrorCall(..))
 import Control.Monad    ( when )
 import Foreign.C
-import Foreign.Marshal.Alloc ( free )
-import Foreign          ( nullPtr )
+import Foreign.Marshal.Alloc ( alloca, free )
+import Foreign          ( nullPtr, peek )
 import GHC.Exts
 import System.Posix.Internals ( CFilePath, withFilePath, peekFilePath )
 import System.FilePath  ( dropExtension, normalise )
 
 
-
-
 -- ---------------------------------------------------------------------------
 -- RTS Linker Interface
 -- ---------------------------------------------------------------------------
@@ -70,6 +70,15 @@ lookupSymbol str_in = do
         then return Nothing
         else return (Just addr)
 
+lookupSymbolInDLL :: Ptr LoadedDLL -> String -> IO (Maybe (Ptr a))
+lookupSymbolInDLL dll str_in = do
+   let str = prefixUnderscore str_in
+   withCAString str $ \c_str -> do
+     addr <- c_lookupSymbolInDLL dll c_str
+     if addr == nullPtr
+       then return Nothing
+       else return (Just addr)
+
 lookupClosure :: String -> IO (Maybe HValueRef)
 lookupClosure str = do
   m <- lookupSymbol str
@@ -89,7 +98,7 @@ prefixUnderscore
 -- (e.g. "libfoo.so" or "foo.dll").  In the latter case, loadDLL
 -- searches the standard locations for the appropriate library.
 --
-loadDLL :: String -> IO (Maybe String)
+loadDLL :: String -> IO (Either String (Ptr LoadedDLL))
 -- Nothing      => success
 -- Just err_msg => failure
 loadDLL str0 = do
@@ -101,12 +110,16 @@ loadDLL str0 = do
      str | isWindowsHost = dropExtension str0
          | otherwise     = str0
   --
-  maybe_errmsg <- withFilePath (normalise str) $ \dll -> c_addDLL dll
-  if maybe_errmsg == nullPtr
-        then return Nothing
-        else do str <- peekCString maybe_errmsg
-                free maybe_errmsg
-                return (Just str)
+  (maybe_handle, maybe_errmsg) <- withFilePath (normalise str) $ \dll ->
+    alloca $ \errmsg_ptr -> (,)
+      <$> c_addDLL dll errmsg_ptr
+      <*> peek errmsg_ptr
+
+  if maybe_handle == nullPtr
+    then do str <- peekCString maybe_errmsg
+            free maybe_errmsg
+            return (Left str)
+    else return (Right maybe_handle)
 
 loadArchive :: String -> IO ()
 loadArchive str = do
@@ -163,7 +176,8 @@ resolveObjs = do
 -- Foreign declarations to RTS entry points which does the real work;
 -- ---------------------------------------------------------------------------
 
-foreign import ccall unsafe "addDLL"                  c_addDLL                  :: CFilePath -> IO CString
+foreign import ccall unsafe "addDLL"                  c_addDLL                  :: CFilePath -> Ptr CString -> IO (Ptr LoadedDLL)
+foreign import ccall unsafe "lookupSymbolInDLL"       c_lookupSymbolInDLL       :: Ptr LoadedDLL -> CString -> IO (Ptr a)
 foreign import ccall unsafe "initLinker_"             c_initLinker_             :: CInt -> IO ()
 foreign import ccall unsafe "lookupSymbol"            c_lookupSymbol            :: CString -> IO (Ptr a)
 foreign import ccall unsafe "loadArchive"             c_loadArchive             :: CFilePath -> IO Int


=====================================
libraries/ghci/GHCi/Run.hs
=====================================
@@ -68,7 +68,7 @@ run m = case m of
   LookupClosure str           -> lookupJSClosure str
 #else
   InitLinker -> initObjLinker RetainCAFs
-  LoadDLL str -> loadDLL str
+  LoadDLL str -> fmap toRemotePtr <$> loadDLL str
   LoadArchive str -> loadArchive str
   LoadObj str -> loadObj str
   UnloadObj str -> unloadObj str
@@ -83,6 +83,8 @@ run m = case m of
 #endif
   RtsRevertCAFs -> rts_revertCAFs
   LookupSymbol str -> fmap toRemotePtr <$> lookupSymbol str
+  LookupSymbolInDLL dll str ->
+    fmap toRemotePtr <$> lookupSymbolInDLL (fromRemotePtr dll) str
   FreeHValueRefs rs -> mapM_ freeRemoteRef rs
   AddSptEntry fpr r -> localRef r >>= sptAddEntry fpr
   EvalStmt opts r -> evalStmt opts r


=====================================
rts/Linker.c
=====================================
@@ -578,13 +578,11 @@ typedef
 /* A list thereof. */
 static OpenedSO* openedSOs = NULL;
 
-static const char *
-internal_dlopen(const char *dll_name)
+static void *
+internal_dlopen(const char *dll_name, const char **errmsg_ptr)
 {
    OpenedSO* o_so;
    void *hdl;
-   const char *errmsg;
-   char *errmsg_copy;
 
    // omitted: RTLD_NOW
    // see http://www.haskell.org/pipermail/cvs-ghc/2007-September/038570.html
@@ -619,14 +617,13 @@ internal_dlopen(const char *dll_name)
    RELEASE_LOCK(&ccs_mutex);
 #endif
 
-   errmsg = NULL;
    if (hdl == NULL) {
       /* dlopen failed; return a ptr to the error msg. */
-      errmsg = dlerror();
+      char *errmsg = dlerror();
       if (errmsg == NULL) errmsg = "addDLL: unknown error";
-      errmsg_copy = stgMallocBytes(strlen(errmsg)+1, "addDLL");
+      char *errmsg_copy = stgMallocBytes(strlen(errmsg)+1, "addDLL");
       strcpy(errmsg_copy, errmsg);
-      errmsg = errmsg_copy;
+      *errmsg_ptr = errmsg_copy;
    } else {
       o_so = stgMallocBytes(sizeof(OpenedSO), "addDLL");
       o_so->handle = hdl;
@@ -637,7 +634,7 @@ internal_dlopen(const char *dll_name)
    RELEASE_LOCK(&dl_mutex);
    //--------------- End critical section -------------------
 
-   return errmsg;
+   return hdl;
 }
 
 /*
@@ -725,16 +722,29 @@ internal_dlsym(const char *symbol) {
     // we failed to find the symbol
     return NULL;
 }
+
+void *lookupSymbolInDLL(void *handle, const char *symbol_name)
+{
+#if defined(OBJFORMAT_MACHO)
+    CHECK(symbol_name[0] == '_');
+    symbol_name = symbol_name+1;
+#endif
+
+    ACQUIRE_LOCK(&dl_mutex); // dlsym alters dlerror
+    void *result = dlsym(handle, symbol_name);
+    RELEASE_LOCK(&dl_mutex);
+    return result;
+}
 #  endif
 
-const char *
-addDLL( pathchar *dll_name )
+void *addDLL(pathchar* dll_name, const char **errmsg_ptr)
 {
 #  if defined(OBJFORMAT_ELF) || defined(OBJFORMAT_MACHO)
    /* ------------------- ELF DLL loader ------------------- */
 
 #define NMATCH 5
    regmatch_t match[NMATCH];
+   void *handle;
    const char *errmsg;
    FILE* fp;
    size_t match_length;
@@ -743,10 +753,10 @@ addDLL( pathchar *dll_name )
    int result;
 
    IF_DEBUG(linker, debugBelch("addDLL: dll_name = '%s'\n", dll_name));
-   errmsg = internal_dlopen(dll_name);
+   handle = internal_dlopen(dll_name, &errmsg);
 
-   if (errmsg == NULL) {
-      return NULL;
+   if (handle != NULL) {
+      return handle;
    }
 
    // GHC #2615
@@ -775,7 +785,8 @@ addDLL( pathchar *dll_name )
       line[match_length] = '\0'; // make sure string is null-terminated
       IF_DEBUG(linker, debugBelch("file name = '%s'\n", line));
       if ((fp = __rts_fopen(line, "r")) == NULL) {
-         return errmsg; // return original error if open fails
+         *errmsg_ptr = errmsg; // return original error if open fails
+         return NULL;
       }
       // try to find a GROUP or INPUT ( ... ) command
       while (fgets(line, MAXLINE, fp) != NULL) {
@@ -785,7 +796,7 @@ addDLL( pathchar *dll_name )
             IF_DEBUG(linker, debugBelch("match%s\n",""));
             line[match[2].rm_eo] = '\0';
             stgFree((void*)errmsg); // Free old message before creating new one
-            errmsg = internal_dlopen(line+match[2].rm_so);
+            handle = internal_dlopen(line+match[2].rm_so, errmsg_ptr);
             break;
          }
          // if control reaches here, no GROUP or INPUT ( ... ) directive
@@ -794,9 +805,10 @@ addDLL( pathchar *dll_name )
       }
       fclose(fp);
    }
-   return errmsg;
+   return handle;
 
 #  elif defined(OBJFORMAT_PEi386)
+   // FIXME
    return addDLL_PEi386(dll_name, NULL);
 
 #  else


=====================================
rts/RtsSymbols.c
=====================================
@@ -618,6 +618,7 @@ extern char **environ;
       SymI_HasProto(purgeObj)                                           \
       SymI_HasProto(insertSymbol)                                       \
       SymI_HasProto(lookupSymbol)                                       \
+      SymI_HasProto(lookupSymbolInDLL)                                  \
       SymI_HasDataProto(stg_makeStablePtrzh)                                \
       SymI_HasDataProto(stg_mkApUpd0zh)                                     \
       SymI_HasDataProto(stg_labelThreadzh)                                  \


=====================================
rts/include/rts/Linker.h
=====================================
@@ -91,7 +91,9 @@ void *loadNativeObj( pathchar *path, char **errmsg );
 HsInt unloadNativeObj( void *handle );
 
 /* load a dynamic library */
-const char *addDLL( pathchar* dll_name );
+void *addDLL(pathchar* dll_name, const char **errmsg);
+
+void *lookupSymbolInDLL(void *handle, const char *symbol_name);
 
 /* add a path to the library search path */
 HsPtr addLibrarySearchPath(pathchar* dll_path);


=====================================
testsuite/tests/rts/linker/T2615.hs
=====================================
@@ -6,5 +6,5 @@ main = do
   initObjLinker RetainCAFs
   result <- loadDLL library_name
   case result of
-    Nothing -> putStrLn (library_name ++ " loaded successfully")
-    Just x  -> putStrLn ("error: " ++ x)
+    Right _ -> putStrLn (library_name ++ " loaded successfully")
+    Left x  -> putStrLn ("error: " ++ x)



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/8362bd5cb0bf523312df3d6eefb223232a34d6bd...0e14a98d47808b845890966349cc50d8b92a1a5e

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/8362bd5cb0bf523312df3d6eefb223232a34d6bd...0e14a98d47808b845890966349cc50d8b92a1a5e
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20241010/aa1f8c99/attachment-0001.html>


More information about the ghc-commits mailing list