[Git][ghc/ghc][wip/marge_bot_batch_merge_job] 2 commits: Escape multiple arguments in the settings file

Marge Bot (@marge-bot) gitlab at gitlab.haskell.org
Thu Mar 14 23:24:09 UTC 2024



Marge Bot pushed to branch wip/marge_bot_batch_merge_job at Glasgow Haskell Compiler / GHC


Commits:
5bea24cf by Fendor at 2024-03-14T19:24:01-04:00
Escape multiple arguments in the settings file

Uses responseFile syntax.

The issue arises when GHC is installed on windows into a location that
has a space, for example the user name is 'Fake User'.
The $topdir will also contain a space, consequentially.
When we resolve the top dir in the string `-I$topdir/mingw/include`,
then `words` will turn this single argument into `-I/C/Users/Fake` and
`User/.../mingw/include` which trips up our flags.
We avoid this by escaping the $topdir before replacing in GHC.

Add regression test case to count the number of options after variable
expansion took place. Additionally, check escaping works.

- - - - -
9848d580 by Fendor at 2024-03-14T19:24:03-04:00
Fix sharing of 'IfaceTyConInfo' during core to iface type translation

During heap analysis, we noticed that during generation of
'mi_extra_decls' we have lots of duplicates for the instances:

* `IfaceTyConInfo NotPromoted IfaceNormalTyCon`
* `IfaceTyConInfo IsPromoted IfaceNormalTyCon`

which should be shared instead of duplicated. This duplication increased
the number of live bytes by around 200MB while loading the agda codebase
into GHCi.

These instances are created during `CoreToIface` translation, in
particular `toIfaceTyCon`.

The generated core looks like:

    toIfaceTyCon
      = \ tc_sjJw ->
          case $wtoIfaceTyCon tc_sjJw of
          { (# ww_sjJz, ww1_sjNL, ww2_sjNM #) ->
          IfaceTyCon ww_sjJz (IfaceTyConInfo ww1_sjNL ww2_sjNM)
          }

whichs removes causes the sharing to work propery.

Adding explicit sharing, with NOINLINE annotations, changes the core to:

    toIfaceTyCon
      = \ tc_sjJq ->
          case $wtoIfaceTyCon tc_sjJq of { (# ww_sjNB, ww1_sjNC #) ->
          IfaceTyCon ww_sjNB ww1_sjNC
          }

which looks much more like sharing is happening.
We confirmed via ghc-debug that all duplications were eliminated and the
number of live bytes are noticeably reduced.

- - - - -


9 changed files:

- compiler/GHC/Iface/Type.hs
- compiler/GHC/Settings/IO.hs
- hadrian/src/Rules/Generate.hs
- + test.hs
- + testsuite/tests/ghc-api/settings-escape/T11938.hs
- + testsuite/tests/ghc-api/settings-escape/T11938.stderr
- + testsuite/tests/ghc-api/settings-escape/all.T
- + testsuite/tests/ghc-api/settings-escape/ghc-install-folder/lib/settings
- + testsuite/tests/ghc-api/settings-escape/ghc-install-folder/mingw/.gitkeep


Changes:

=====================================
compiler/GHC/Iface/Type.hs
=====================================
@@ -361,12 +361,51 @@ data IfaceTyConInfo   -- Used only to guide pretty-printing
                    , ifaceTyConSort       :: IfaceTyConSort }
     deriving (Eq)
 
--- This smart constructor allows sharing of the two most common
--- cases. See #19194
+-- | This smart constructor allows sharing of the two most common
+-- cases. See Note [Sharing IfaceTyConInfo]
 mkIfaceTyConInfo :: PromotionFlag -> IfaceTyConSort -> IfaceTyConInfo
-mkIfaceTyConInfo IsPromoted  IfaceNormalTyCon = IfaceTyConInfo IsPromoted  IfaceNormalTyCon
-mkIfaceTyConInfo NotPromoted IfaceNormalTyCon = IfaceTyConInfo NotPromoted IfaceNormalTyCon
-mkIfaceTyConInfo prom        sort             = IfaceTyConInfo prom        sort
+mkIfaceTyConInfo IsPromoted  IfaceNormalTyCon = promotedNormalTyConInfo
+mkIfaceTyConInfo NotPromoted IfaceNormalTyCon = notPromotedNormalTyConInfo
+mkIfaceTyConInfo prom        sort             = IfaceTyConInfo prom sort
+
+{-# NOINLINE promotedNormalTyConInfo #-}
+-- | See Note [Sharing IfaceTyConInfo]
+promotedNormalTyConInfo :: IfaceTyConInfo
+promotedNormalTyConInfo = IfaceTyConInfo IsPromoted IfaceNormalTyCon
+
+{-# NOINLINE notPromotedNormalTyConInfo #-}
+-- | See Note [Sharing IfaceTyConInfo]
+notPromotedNormalTyConInfo :: IfaceTyConInfo
+notPromotedNormalTyConInfo = IfaceTyConInfo NotPromoted IfaceNormalTyCon
+
+{-
+Note [Sharing IfaceTyConInfo]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+'IfaceTyConInfo' occurs an awful lot in 'ModIface', see #19194 for an example.
+But almost all of them are
+
+   IfaceTyConInfo IsPromoted IfaceNormalTyCon
+   IfaceTyConInfo NotPromoted IfaceNormalTyCon.
+
+The smart constructor `mkIfaceTyConInfo` arranges to share these instances,
+thus:
+
+  promotedNormalTyConInfo    = IfaceTyConInfo IsPromoted  IfaceNormalTyCon
+  notPromotedNormalTyConInfo = IfaceTyConInfo NotPromoted IfaceNormalTyCon
+
+  mkIfaceTyConInfo IsPromoted  IfaceNormalTyCon = promotedNormalTyConInfo
+  mkIfaceTyConInfo NotPromoted IfaceNormalTyCon = notPromotedNormalTyConInfo
+  mkIfaceTyConInfo prom        sort             = IfaceTyConInfo prom sort
+
+But ALAS, the (nested) CPR transform can lose this sharing, completely
+negating the effect of `mkIfaceTyConInfo`: see #24530 and #19326.
+
+Sticking-plaster solution: add a NOINLINE pragma to those top-level constants.
+When we fix the CPR bug we can remove the NOINLINE pragmas.
+
+This one change leads to an 15% reduction in residency for GHC when embedding
+'mi_extra_decls': see !12222.
+-}
 
 data IfaceMCoercion
   = IfaceMRefl


=====================================
compiler/GHC/Settings/IO.hs
=====================================
@@ -16,9 +16,11 @@ import GHC.Utils.CliOption
 import GHC.Utils.Fingerprint
 import GHC.Platform
 import GHC.Utils.Panic
+import GHC.ResponseFile
 import GHC.Settings
 import GHC.SysTools.BaseDir
 
+import Data.Char
 import Control.Monad.Trans.Except
 import Control.Monad.IO.Class
 import qualified Data.Map as Map
@@ -72,9 +74,13 @@ initSettings top_dir = do
   -- just partially applying those functions and throwing 'Left's; they're
   -- written in a very portable style to keep ghc-boot light.
   let getSetting key = either pgmError pure $
-        getRawFilePathSetting top_dir settingsFile mySettings key
+        -- Escape the 'top_dir', to make sure we don't accidentally introduce an
+        -- unescaped space
+        getRawFilePathSetting (escapeArg top_dir) settingsFile mySettings key
       getToolSetting :: String -> ExceptT SettingsError m String
-      getToolSetting key = expandToolDir useInplaceMinGW mtool_dir <$> getSetting key
+        -- Escape the 'mtool_dir', to make sure we don't accidentally introduce
+        -- an unescaped space
+      getToolSetting key = expandToolDir useInplaceMinGW (fmap escapeArg mtool_dir) <$> getSetting key
   targetPlatformString <- getSetting "target platform string"
   cc_prog <- getToolSetting "C compiler command"
   cxx_prog <- getToolSetting "C++ compiler command"
@@ -91,10 +97,10 @@ initSettings top_dir = do
   let unreg_cc_args = if platformUnregisterised platform
                       then ["-DNO_REGS", "-DUSE_MINIINTERPRETER"]
                       else []
-      cpp_args    = map Option (words cpp_args_str)
-      hs_cpp_args = map Option (words hs_cpp_args_str)
-      cc_args  = words cc_args_str ++ unreg_cc_args
-      cxx_args = words cxx_args_str
+      cpp_args    = map Option (unescapeArgs cpp_args_str)
+      hs_cpp_args = map Option (unescapeArgs hs_cpp_args_str)
+      cc_args  = unescapeArgs cc_args_str ++ unreg_cc_args
+      cxx_args = unescapeArgs cxx_args_str
 
       -- The extra flags we need to pass gcc when we invoke it to compile .hc code.
       --
@@ -135,12 +141,12 @@ initSettings top_dir = do
   let   as_prog  = cc_prog
         as_args  = map Option cc_args
         ld_prog  = cc_prog
-        ld_args  = map Option (cc_args ++ words cc_link_args_str)
+        ld_args  = map Option (cc_args ++ unescapeArgs cc_link_args_str)
   ld_r_prog <- getToolSetting "Merge objects command"
   ld_r_args <- getToolSetting "Merge objects flags"
   let ld_r
         | null ld_r_prog = Nothing
-        | otherwise      = Just (ld_r_prog, map Option $ words ld_r_args)
+        | otherwise      = Just (ld_r_prog, map Option $ unescapeArgs ld_r_args)
 
   llvmTarget <- getSetting "LLVM target"
 
@@ -261,3 +267,19 @@ getTargetPlatform settingsFile settings = do
     , platformHasLibm = targetHasLibm
     , platform_constants = Nothing -- will be filled later when loading (or building) the RTS unit
     }
+
+-- ----------------------------------------------------------------------------
+-- Escape Args helpers
+-- ----------------------------------------------------------------------------
+
+-- | Just like 'GHC.ResponseFile.escapeArg', but it is not exposed from base.
+escapeArg :: String -> String
+escapeArg = reverse . foldl' escape []
+
+escape :: String -> Char -> String
+escape cs c
+  |    isSpace c
+    || '\\' == c
+    || '\'' == c
+    || '"'  == c = c:'\\':cs -- n.b., our caller must reverse the result
+  | otherwise    = c:cs


=====================================
hadrian/src/Rules/Generate.hs
=====================================
@@ -6,6 +6,7 @@ module Rules.Generate (
     ) where
 
 import Development.Shake.FilePath
+import Data.Char (isSpace)
 import qualified Data.Set as Set
 import Base
 import qualified Context
@@ -416,7 +417,7 @@ generateSettings = do
 
         , ("Use interpreter", expr $ yesNo <$> ghcWithInterpreter)
         , ("Support SMP", expr $ yesNo <$> targetSupportsSMP)
-        , ("RTS ways", unwords . map show . Set.toList <$> getRtsWays)
+        , ("RTS ways", escapeArgs . map show . Set.toList <$> getRtsWays)
         , ("Tables next to code", queryTarget (yesNo . tgtTablesNextToCode))
         , ("Leading underscore",  queryTarget (yesNo . tgtSymbolsHaveLeadingUnderscore))
         , ("Use LibFFI", expr $ yesNo <$> useLibffiForAdjustors)
@@ -431,23 +432,23 @@ generateSettings = do
             ++ ["]"]
   where
     ccPath  = prgPath . ccProgram . tgtCCompiler
-    ccFlags = unwords . prgFlags . ccProgram . tgtCCompiler
+    ccFlags = escapeArgs . prgFlags . ccProgram . tgtCCompiler
     cxxPath  = prgPath . cxxProgram . tgtCxxCompiler
-    cxxFlags = unwords . prgFlags . cxxProgram . tgtCxxCompiler
-    clinkFlags = unwords . prgFlags . ccLinkProgram . tgtCCompilerLink
+    cxxFlags = escapeArgs . prgFlags . cxxProgram . tgtCxxCompiler
+    clinkFlags = escapeArgs . prgFlags . ccLinkProgram . tgtCCompilerLink
     linkSupportsNoPie = yesNo . ccLinkSupportsNoPie . tgtCCompilerLink
     cppPath  = prgPath . cppProgram . tgtCPreprocessor
-    cppFlags = unwords . prgFlags . cppProgram . tgtCPreprocessor
+    cppFlags = escapeArgs . prgFlags . cppProgram . tgtCPreprocessor
     hsCppPath  = prgPath . hsCppProgram . tgtHsCPreprocessor
-    hsCppFlags = unwords . prgFlags . hsCppProgram . tgtHsCPreprocessor
+    hsCppFlags = escapeArgs . prgFlags . hsCppProgram . tgtHsCPreprocessor
     mergeObjsPath  = maybe "" (prgPath . mergeObjsProgram) . tgtMergeObjs
-    mergeObjsFlags = maybe "" (unwords . prgFlags . mergeObjsProgram) . tgtMergeObjs
+    mergeObjsFlags = maybe "" (escapeArgs . prgFlags . mergeObjsProgram) . tgtMergeObjs
     linkSupportsSingleModule    = yesNo . ccLinkSupportsSingleModule . tgtCCompilerLink
     linkSupportsFilelist        = yesNo . ccLinkSupportsFilelist . tgtCCompilerLink
     linkSupportsCompactUnwind   = yesNo . ccLinkSupportsCompactUnwind . tgtCCompilerLink
     linkIsGnu                   = yesNo . ccLinkIsGnu . tgtCCompilerLink
     arPath  = prgPath . arMkArchive . tgtAr
-    arFlags = unwords . prgFlags . arMkArchive . tgtAr
+    arFlags = escapeArgs . prgFlags . arMkArchive . tgtAr
     arSupportsAtFile' = yesNo . arSupportsAtFile . tgtAr
     arSupportsDashL' = yesNo . arSupportsDashL . tgtAr
     ranlibPath  = maybe "" (prgPath . ranlibProgram) . tgtRanlib
@@ -571,3 +572,19 @@ generatePlatformHostHs = do
         , "hostPlatformArchOS :: ArchOS"
         , "hostPlatformArchOS = ArchOS hostPlatformArch hostPlatformOS"
         ]
+
+-- | Just like 'GHC.ResponseFile.escapeArgs', but use spaces instead of newlines
+-- for splitting elements.
+escapeArgs :: [String] -> String
+escapeArgs = unwords . map escapeArg
+
+escapeArg :: String -> String
+escapeArg = reverse . foldl' escape []
+
+escape :: String -> Char -> String
+escape cs c
+  |    isSpace c
+    || '\\' == c
+    || '\'' == c
+    || '"'  == c = c:'\\':cs -- n.b., our caller must reverse the result
+  | otherwise    = c:cs


=====================================
test.hs
=====================================
@@ -0,0 +1,14 @@
+import Data.Char
+import Data.Foldable
+-- | Just like 'GHC.ResponseFile.escapeArg', but it is not exposed from base.
+escapeArg :: String -> String
+escapeArg = reverse . foldl' escape []
+
+escape :: String -> Char -> String
+escape cs c
+  |    isSpace c
+    || '\\' == c
+    || '\'' == c
+    || '"'  == c = c:'\\':cs -- n.b., our caller must reverse the result
+  | otherwise    = c:cs
+


=====================================
testsuite/tests/ghc-api/settings-escape/T11938.hs
=====================================
@@ -0,0 +1,73 @@
+
+import GHC.Settings
+import GHC.Settings.IO
+import GHC.Utils.CliOption (Option, showOpt)
+
+import Control.Monad.Trans.Except (runExceptT)
+import Data.Maybe (fromJust)
+import System.Directory (makeAbsolute)
+import System.IO (hPutStrLn, stderr)
+import System.Exit (exitWith, ExitCode(ExitFailure))
+
+-- Precondition: this test case must be executed in a directory with a space.
+main :: IO ()
+main = do
+  topDir <- makeAbsolute "./ghc-install-folder/lib"
+  settingsm <- runExceptT $ initSettings topDir
+
+  case settingsm of
+    Left (SettingsError_MissingData msg) -> do
+      hPutStrLn stderr $ "WARNING: " ++ show msg
+      hPutStrLn stderr $ "dont know target platform"
+      exitWith $ ExitFailure 1
+    Left (SettingsError_BadData msg) -> do
+      hPutStrLn stderr msg
+      exitWith $ ExitFailure 1
+    Right settings -> do
+      let
+        recordSetting :: String -> (Settings -> [String]) -> IO ()
+        recordSetting label selector = do
+          let opts = selector settings
+              -- At least one of the options must contain a space
+              containsSpaces = any (' ' `elem`) opts
+          hPutStrLn stderr $ "=== Number of '" <> label <> "' options: " ++ show (length opts)
+          hPutStrLn stderr $ "    Contains spaces: " ++ show containsSpaces
+
+        recordFpSetting :: String -> (Settings -> String) -> IO ()
+        recordFpSetting label selector = do
+          let fp = selector settings
+              containsOnlyEscapedSpaces ('\\':' ':xs) = containsOnlyEscapedSpaces xs
+              containsOnlyEscapedSpaces (' ':_) = False
+              containsOnlyEscapedSpaces [] = True
+              containsOnlyEscapedSpaces (_:xs) = containsOnlyEscapedSpaces xs
+
+              -- Filepath may only contain escaped spaces
+              containsSpaces = containsOnlyEscapedSpaces fp
+          hPutStrLn stderr $ "=== FilePath '" <> label <> "' contains only escaped spaces: " ++ show containsSpaces
+
+      -- Assertions
+      -- Assumption: this test case is executed in a directory with a space.
+
+      -- Setting 'Haskell CPP flags' contains '$topdir' and '$tooldir' references.
+      -- Resolving those while containing spaces, should not introduce more options.
+      -- '$tooldir' will only be expanded in windows, while '$topdir' is always expanded.
+      recordSetting "Haskell CPP flags" (map showOpt . snd . toolSettings_pgm_P . sToolSettings)
+      -- Setting 'C compiler flags' contains strings with spaces.
+      -- GHC should not split these by word.
+      recordSetting "C compiler flags" (toolSettings_opt_c . sToolSettings)
+      -- Setting 'C compiler link flags' contains strings with spaces.
+      -- GHC should not split these by word.
+      recordSetting "C compiler link flags" (map showOpt . snd . toolSettings_pgm_l . sToolSettings)
+      -- Setting 'C++ compiler flags' contains strings with spaces.
+      -- GHC should not split these by word.
+      recordSetting "C++ compiler flags" (toolSettings_opt_cxx . sToolSettings)
+      -- Setting 'CPP Flags' contains strings with spaces.
+      -- GHC should not split these by word.
+      recordSetting "CPP Flags" (map showOpt . snd . toolSettings_pgm_cpp . sToolSettings)
+      -- Setting 'Merge objects flags' contains strings with spaces.
+      -- GHC should not split these by word.
+      -- We know in this case, that 'toolSettings_pgm_lm' is 'Just'
+      recordSetting "Merge objects flags" (map showOpt . snd . fromJust . toolSettings_pgm_lm . sToolSettings)
+      -- Setting 'unlit command' contains '$topdir' reference.
+      -- Resolving those while containing spaces, should be escaped correctly.
+      recordFpSetting "unlit command" (toolSettings_pgm_L . sToolSettings)


=====================================
testsuite/tests/ghc-api/settings-escape/T11938.stderr
=====================================
@@ -0,0 +1,13 @@
+=== Number of 'Haskell CPP flags' options: 5
+    Contains spaces: True
+=== Number of 'C compiler flags' options: 3
+    Contains spaces: True
+=== Number of 'C compiler link flags' options: 7
+    Contains spaces: True
+=== Number of 'C++ compiler flags' options: 2
+    Contains spaces: True
+=== Number of 'CPP Flags' options: 3
+    Contains spaces: True
+=== Number of 'Merge objects flags' options: 3
+    Contains spaces: True
+=== FilePath 'unlit command' contains only escaped spaces: True


=====================================
testsuite/tests/ghc-api/settings-escape/all.T
=====================================
@@ -0,0 +1 @@
+test('T11938', [normal, extra_files(['ghc-install-folder/'])], compile_and_run, ['-package ghc -package directory -package transformers'])


=====================================
testsuite/tests/ghc-api/settings-escape/ghc-install-folder/lib/settings
=====================================
@@ -0,0 +1,51 @@
+[("C compiler command", "/nix/store/lcf37pgp3rgww67v9x2990hbfwx96c1w-gcc-wrapper-12.2.0/bin/cc")
+,("C compiler flags", "-O2 \"-some option\" -some\\ other")
+,("C++ compiler command", "/nix/store/lcf37pgp3rgww67v9x2990hbfwx96c1w-gcc-wrapper-12.2.0/bin/g++")
+,("C++ compiler flags", "\"-some option\" -some\\ other")
+,("C compiler link flags", "-fuse-ld=gold -Wl,--no-as-needed \"-some option\" -some\\ other")
+,("C compiler supports -no-pie", "YES")
+,("CPP command", "/nix/store/lcf37pgp3rgww67v9x2990hbfwx96c1w-gcc-wrapper-12.2.0/bin/cc")
+,("CPP flags", "-E \"-some option\" -some\\ other")
+,("Haskell CPP command", "/nix/store/lcf37pgp3rgww67v9x2990hbfwx96c1w-gcc-wrapper-12.2.0/bin/cc")
+,("Haskell CPP flags", "-E -undef -traditional -I$topdir/ -I$tooldir/")
+,("ld supports compact unwind", "NO")
+,("ld supports filelist", "NO")
+,("ld supports single module", "NO")
+,("ld is GNU ld", "YES")
+,("Merge objects command", "/nix/store/lcf37pgp3rgww67v9x2990hbfwx96c1w-gcc-wrapper-12.2.0/bin/ld.gold")
+,("Merge objects flags", "-r \"-some option\" -some\\ other")
+,("Merge objects supports response files", "YES")
+,("ar command", "/nix/store/lcf37pgp3rgww67v9x2990hbfwx96c1w-gcc-wrapper-12.2.0/bin/ar")
+,("ar flags", "q")
+,("ar supports at file", "YES")
+,("ar supports -L", "NO")
+,("ranlib command", "/nix/store/lcf37pgp3rgww67v9x2990hbfwx96c1w-gcc-wrapper-12.2.0/bin/ranlib")
+,("otool command", "otool")
+,("install_name_tool command", "install_name_tool")
+,("touch command", "touch")
+,("windres command", "/bin/false")
+,("unlit command", "$topdir/../bin/unlit")
+,("cross compiling", "NO")
+,("target platform string", "x86_64-unknown-linux")
+,("target os", "OSLinux")
+,("target arch", "ArchX86_64")
+,("target word size", "8")
+,("target word big endian", "NO")
+,("target has GNU nonexec stack", "YES")
+,("target has .ident directive", "YES")
+,("target has subsections via symbols", "NO")
+,("target has libm", "YES")
+,("Unregisterised", "NO")
+,("LLVM target", "x86_64-unknown-linux")
+,("LLVM llc command", "llc")
+,("LLVM opt command", "opt")
+,("LLVM llvm-as command", "clang")
+,("Use inplace MinGW toolchain", "NO")
+,("Use interpreter", "YES")
+,("Support SMP", "YES")
+,("RTS ways", "v thr thr_debug thr_debug_dyn thr_dyn debug debug_dyn dyn")
+,("Tables next to code", "YES")
+,("Leading underscore", "NO")
+,("Use LibFFI", "NO")
+,("RTS expects libdw", "YES")
+]


=====================================
testsuite/tests/ghc-api/settings-escape/ghc-install-folder/mingw/.gitkeep
=====================================



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/e0a1d5202af25a49398f266cc011ac281ff8f914...9848d58006f7361b847e4991d899fdc5a06d75ca

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/e0a1d5202af25a49398f266cc011ac281ff8f914...9848d58006f7361b847e4991d899fdc5a06d75ca
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240314/9be5c5cb/attachment-0001.html>


More information about the ghc-commits mailing list