[Git][ghc/ghc][wip/andreask/opt_dumps] Optimize dumping of consecutive whitespace.

Andreas Klebinger gitlab at gitlab.haskell.org
Tue Nov 24 19:58:51 UTC 2020



Andreas Klebinger pushed to branch wip/andreask/opt_dumps at Glasgow Haskell Compiler / GHC


Commits:
327c725a by Andreas Klebinger at 2020-11-24T20:57:43+01:00
Optimize dumping of consecutive whitespace.

The naive way of putting out n characters of indent would be something
like `hPutStr hdl (replicate n ' ')`. However this is quite inefficient
as we allocate an absurd number of strings consisting of simply spaces
as we don't cache them.

To improve on this we now track if we can simply write ascii spaces via
hPutBuf instead. This is the case when running with -ddump-to-file where
we force the encoding to be UTF8.

This avoids both the cost of going through encoding as well as avoiding
allocation churn from all the white space. Instead we simply use hPutBuf
on a preallocated unlifted string.

When dumping stg like this:

> nofib/spectral/simple/Main.hs -fforce-recomp -ddump-stg-final -ddump-to-file -c +RTS -s

Allocations went from 1,778 MB to 1,702MB. About a 4% reduction of
allocation! I did not measure the difference in runtime but expect it
to be similar.

Bumps the haddock submodule since the interface of GHC's Pretty
slightly changed.

- - - - -


6 changed files:

- compiler/GHC/Driver/Ppr.hs
- compiler/GHC/Driver/Session.hs
- compiler/GHC/Utils/Error.hs
- compiler/GHC/Utils/Outputable.hs
- compiler/GHC/Utils/Ppr.hs
- utils/haddock


Changes:

=====================================
compiler/GHC/Driver/Ppr.hs
=====================================
@@ -66,7 +66,7 @@ showSDocDebug dflags d = renderWithContext ctx d
 
 printForUser :: DynFlags -> Handle -> PrintUnqualified -> Depth -> SDoc -> IO ()
 printForUser dflags handle unqual depth doc
-  = printSDocLn ctx PageMode handle doc
+  = printSDocLn ctx (PageMode False) handle doc
     where ctx = initSDocContext dflags (mkUserStyle unqual depth)
 
 -- | Like 'printSDocLn' but specialized with 'LeftMode' and


=====================================
compiler/GHC/Driver/Session.hs
=====================================
@@ -1377,7 +1377,7 @@ defaultFatalMessager = hPutStrLn stderr
 jsonLogAction :: LogAction
 jsonLogAction dflags reason severity srcSpan msg
   =
-    defaultLogActionHPutStrDoc dflags stdout
+    defaultLogActionHPutStrDoc dflags True stdout
       (withPprStyle (PprCode CStyle) (doc $$ text ""))
     where
       str = renderWithContext (initSDocContext dflags defaultUserStyle) msg
@@ -1400,9 +1400,9 @@ defaultLogAction dflags reason severity srcSpan msg
       SevWarning     -> printWarns
       SevError       -> printWarns
     where
-      printOut   = defaultLogActionHPrintDoc  dflags stdout
-      printErrs  = defaultLogActionHPrintDoc  dflags stderr
-      putStrSDoc = defaultLogActionHPutStrDoc dflags stdout
+      printOut   = defaultLogActionHPrintDoc  dflags False stdout
+      printErrs  = defaultLogActionHPrintDoc  dflags False stderr
+      putStrSDoc = defaultLogActionHPutStrDoc dflags False stdout
       -- Pretty print the warning flag, if any (#10752)
       message = mkLocMessageAnn flagMsg severity srcSpan msg
 
@@ -1442,16 +1442,19 @@ defaultLogAction dflags reason severity srcSpan msg
           | otherwise = ""
 
 -- | Like 'defaultLogActionHPutStrDoc' but appends an extra newline.
-defaultLogActionHPrintDoc :: DynFlags -> Handle -> SDoc -> IO ()
-defaultLogActionHPrintDoc dflags h d
- = defaultLogActionHPutStrDoc dflags h (d $$ text "")
-
-defaultLogActionHPutStrDoc :: DynFlags -> Handle -> SDoc -> IO ()
-defaultLogActionHPutStrDoc dflags h d
+defaultLogActionHPrintDoc :: DynFlags -> Bool -> Handle -> SDoc -> IO ()
+defaultLogActionHPrintDoc dflags asciiSpace h d
+ = defaultLogActionHPutStrDoc dflags asciiSpace h (d $$ text "")
+
+-- | The boolean arguments let's the pretty printer know if it can optimize indent
+-- by writing ascii ' ' characters without going through decoding.
+defaultLogActionHPutStrDoc :: DynFlags -> Bool -> Handle -> SDoc -> IO ()
+defaultLogActionHPutStrDoc dflags asciiSpace h d
   -- Don't add a newline at the end, so that successive
   -- calls to this log-action can output all on the same line
-  = printSDoc ctx Pretty.PageMode h d
-    where ctx = initSDocContext dflags defaultUserStyle
+  = printSDoc ctx (Pretty.PageMode asciiSpace) h d
+    where
+      ctx = initSDocContext dflags defaultUserStyle
 
 newtype FlushOut = FlushOut (IO ())
 


=====================================
compiler/GHC/Utils/Error.hs
=====================================
@@ -327,7 +327,8 @@ dumpSDocWithStyle sty dflags dumpOpt hdr doc =
                                 $$ blankLine
                                 $$ doc
                         return $ mkDumpDoc hdr d
-        defaultLogActionHPrintDoc dflags handle (withPprStyle sty doc')
+        -- When we dump to files we use UTF8. Which allows ascii spaces.
+        defaultLogActionHPrintDoc dflags True handle (withPprStyle sty doc')
 
     -- write the dump to stdout
     writeDump Nothing = do


=====================================
compiler/GHC/Utils/Outputable.hs
=====================================
@@ -563,7 +563,7 @@ pprCode cs d = withPprStyle (PprCode cs) d
 
 renderWithContext :: SDocContext -> SDoc -> String
 renderWithContext ctx sdoc
-  = let s = Pretty.style{ Pretty.mode       = PageMode,
+  = let s = Pretty.style{ Pretty.mode       = PageMode False,
                           Pretty.lineLength = sdocLineLength ctx }
     in Pretty.renderStyle s $ runSDoc sdoc ctx
 


=====================================
compiler/GHC/Utils/Ppr.hs
=====================================
@@ -917,16 +917,26 @@ data Style
           , ribbonsPerLine :: Float -- ^ Ratio of line length to ribbon length
           }
 
--- | The default style (@mode=PageMode, lineLength=100, ribbonsPerLine=1.5@).
+-- | The default style (@mode=PageMode False, lineLength=100, ribbonsPerLine=1.5@).
 style :: Style
-style = Style { lineLength = 100, ribbonsPerLine = 1.5, mode = PageMode }
+style = Style { lineLength = 100, ribbonsPerLine = 1.5, mode = PageMode False }
 
 -- | Rendering mode.
-data Mode = PageMode     -- ^ Normal
+data Mode = PageMode { asciiSpace :: Bool }    -- ^ Normal
           | ZigZagMode   -- ^ With zig-zag cuts
           | LeftMode     -- ^ No indentation, infinitely long lines
           | OneLineMode  -- ^ All on one line
 
+-- | Can we output an ascii space character for spaces?
+--   Mostly true, but not for e.g. UTF16
+--   See Note [putSpaces optimizations] for why we bother
+--   to track this.
+hasAsciiSpace :: Mode -> Bool
+hasAsciiSpace mode =
+  case mode of
+    PageMode asciiSpace -> asciiSpace
+    _ -> False
+
 -- | Render the @Doc@ to a String using the given @Style at .
 renderStyle :: Style -> Doc -> String
 renderStyle s = fullRender (mode s) (lineLength s) (ribbonsPerLine s)
@@ -1034,6 +1044,20 @@ printDoc :: Mode -> Int -> Handle -> Doc -> IO ()
 -- printDoc adds a newline to the end
 printDoc mode cols hdl doc = printDoc_ mode cols hdl (doc $$ text "")
 
+{- Note [putSpaces optimizations]
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When using dump flags a lot of what we are dumping ends up being whitespace.
+This is especially true for Core/Stg dumps. Enough so that it's worth optimizing.
+
+Especially in the common case of writing to an UTF8 or similarly encoded file
+where space is equal to ascii space we use hPutBuf to write a preallocated
+buffer to the file. This avoids a fair bit of allocation.
+
+For other cases we fall back to the old and slow path for simplicity.
+
+-}
+
 printDoc_ :: Mode -> Int -> Handle -> Doc -> IO ()
 -- printDoc_ does not add a newline at the end, so that
 -- successive calls can output stuff on the same line
@@ -1051,9 +1075,27 @@ printDoc_ mode pprCols hdl doc
                           -- the I/O library's encoding layer. (#3398)
     put (ZStr s)   next = hPutFZS  hdl s >> next
     put (LStr s)   next = hPutPtrString hdl s >> next
-    put (RStr n c) next = hPutStr hdl (replicate n c) >> next
+    put (RStr n c) next
+      | c == ' '
+      = putSpaces n >> next
+      | otherwise
+      = hPutStr hdl (replicate n c) >> next
+    putSpaces n
+      -- If we use ascii spaces we are allowed to use hPutBuf
+      -- See Note [putSpaces optimizations]
+      | hasAsciiSpace mode
+      , n <= 100
+      = hPutBuf hdl (Ptr spaces') n
+      | hasAsciiSpace mode
+      , n > 100
+      = hPutBuf hdl (Ptr spaces') 100 >> putSpaces (n-100)
+
+      | otherwise = hPutStr hdl (replicate n ' ')
 
     done = return () -- hPutChar hdl '\n'
+    -- 100 spaces, so we avoid the allocation of replicate n ' '
+    spaces' = "                                                                                                    "#
+
 
   -- some versions of hPutBuf will barf if the length is zero
 hPutPtrString :: Handle -> PtrString -> IO ()


=====================================
utils/haddock
=====================================
@@ -1 +1 @@
-Subproject commit 4d0498d503bd51b7d7626497580232685a2691a1
+Subproject commit b2f1aef40cf50cf08eb28f34a9af2b1f9155c2df



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/327c725acb2530e75f9e86d725f7ac3b130a001a

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/327c725acb2530e75f9e86d725f7ac3b130a001a
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20201124/b34005e6/attachment-0001.html>


More information about the ghc-commits mailing list