[Git][ghc/ghc][wip/marge_bot_batch_merge_job] 4 commits: Avoid serializing BCOs with the internal interpreter

Wed Sep 13 09:36:04 UTC 2023

Marge Bot pushed to branch wip/marge_bot_batch_merge_job at Glasgow Haskell Compiler / GHC


Commits:
dfc4f426 by Krzysztof Gogolewski at 2023-09-12T20:31:35-04:00
Avoid serializing BCOs with the internal interpreter

Refs #23919

- - - - -
d12dd069 by Finley McIlwaine at 2023-09-13T05:35:58-04:00
Fix numa auto configure

- - - - -
4e966157 by Simon Peyton Jones at 2023-09-13T05:35:58-04:00
Add -fno-cse to T15426 and T18964

This -fno-cse change is to avoid these performance tests depending on
flukey CSE stuff.  Each contains several independent tests, and we don't
want them to interact.

See #23925.

By killing CSE we expect a 400% increase in T15426, and 100% in T18964.

Metric Increase:
    T15426
    T18964

- - - - -
9ad5ead0 by Simon Peyton Jones at 2023-09-13T05:35:58-04:00
Tiny refactor

canEtaReduceToArity was only called internally, and always with
two arguments equal to zero.  This patch just specialises the
function, and renames it to cantEtaReduceFun.

No change in behaviour.

- - - - -


9 changed files:

- compiler/GHC/Core/Opt/Arity.hs
- compiler/GHC/Runtime/Interpreter.hs
- compiler/GHC/Utils/Misc.hs
- libraries/ghci/GHCi/Message.hs
- libraries/ghci/GHCi/Run.hs
- libraries/ghci/GHCi/TH.hs
- m4/fp_find_libnuma.m4
- testsuite/tests/perf/should_run/T15426.hs
- testsuite/tests/perf/should_run/T18964.hs


Changes:

=====================================
compiler/GHC/Core/Opt/Arity.hs
=====================================
@@ -87,6 +87,8 @@ import GHC.Utils.Outputable
 import GHC.Utils.Panic
 import GHC.Utils.Misc
 
+import Data.Maybe( isJust )
+
 {-
 ************************************************************************
 *                                                                      *
@@ -2376,7 +2378,7 @@ perform eta reduction on an expression with n leading lambdas `\xs. e xs`
 (checked in 'is_eta_reduction_sound' in 'tryEtaReduce', which focuses on the
 case where `e` is trivial):
 
- A. It is sound to eta-reduce n arguments as long as n does not exceed the
+(A) It is sound to eta-reduce n arguments as long as n does not exceed the
     `exprArity` of `e`. (Needs Arity analysis.)
     This criterion exploits information about how `e` is *defined*.
 
@@ -2385,7 +2387,7 @@ case where `e` is trivial):
     By contrast, it would be *unsound* to eta-reduce 2 args, `\x y. e x y` to `e`:
     `e 42` diverges when `(\x y. e x y) 42` does not.
 
- S. It is sound to eta-reduce n arguments in an evaluation context in which all
+(S) It is sound to eta-reduce n arguments in an evaluation context in which all
     calls happen with at least n arguments. (Needs Strictness analysis.)
     NB: This treats evaluations like a call with 0 args.
     NB: This criterion exploits information about how `e` is *used*.
@@ -2412,13 +2414,13 @@ case where `e` is trivial):
     See Note [Eta reduction based on evaluation context] for the implementation
     details. This criterion is tested extensively in T21261.
 
- R. Note [Eta reduction in recursive RHSs] tells us that we should not
+(R) Note [Eta reduction in recursive RHSs] tells us that we should not
     eta-reduce `f` in its own RHS and describes our fix.
     There we have `f = \x. f x` and we should not eta-reduce to `f=f`. Which
     might change a terminating program (think @f `seq` e@) to a non-terminating
     one.
 
- E. (See fun_arity in tryEtaReduce.) As a perhaps special case on the
+(E) (See fun_arity in tryEtaReduce.) As a perhaps special case on the
     boundary of (A) and (S), when we know that a fun binder `f` is in
     WHNF, we simply assume it has arity 1 and apply (A).  Example:
        g f = f `seq` \x. f x
@@ -2428,7 +2430,7 @@ case where `e` is trivial):
 And here are a few more technical criteria for when it is *not* sound to
 eta-reduce that are specific to Core and GHC:
 
- L. With linear types, eta-reduction can break type-checking:
+(L) With linear types, eta-reduction can break type-checking:
       f :: A ⊸ B
       g :: A -> B
       g = \x. f x
@@ -2436,13 +2438,13 @@ eta-reduce that are specific to Core and GHC:
     complain that g and f don't have the same type. NB: Not unsound in the
     dynamic semantics, but unsound according to the static semantics of Core.
 
- J. We may not undersaturate join points.
+(J) We may not undersaturate join points.
     See Note [Invariants on join points] in GHC.Core, and #20599.
 
- B. We may not undersaturate functions with no binding.
+(B) We may not undersaturate functions with no binding.
     See Note [Eta expanding primops].
 
- W. We may not undersaturate StrictWorkerIds.
+(W) We may not undersaturate StrictWorkerIds.
     See Note [CBV Function Ids] in GHC.Types.Id.Info.
 
 Here is a list of historic accidents surrounding unsound eta-reduction:
@@ -2699,7 +2701,7 @@ tryEtaReduce rec_ids bndrs body eval_sd
            || all_calls_with_arity incoming_arity)   -- criterion (S)
       -- ... and that the function can be eta reduced to arity 0
       -- without violating invariants of Core and GHC
-      && canEtaReduceToArity fun 0 0              -- criteria (L), (J), (W), (B)
+      && not (cantEtaReduceFun fun)                  -- criteria (L), (J), (W), (B)
     all_calls_with_arity n = isStrict (fst $ peelManyCalls n eval_sd)
        -- See Note [Eta reduction based on evaluation context]
 
@@ -2754,19 +2756,18 @@ tryEtaReduce rec_ids bndrs body eval_sd
 
     ok_arg _ _ _ _ = Nothing

--- | Can we eta-reduce the given function to the specified arity?
+-- | Can we eta-reduce the given function
 -- See Note [Eta reduction soundness], criteria (B), (J), (W) and (L).
-canEtaReduceToArity :: Id -> JoinArity -> Arity -> Bool
-canEtaReduceToArity fun dest_join_arity dest_arity =
-  not $
-        hasNoBinding fun -- (B)
+cantEtaReduceFun :: Id -> Bool
+cantEtaReduceFun fun
+  =    hasNoBinding fun -- (B)
        -- Don't undersaturate functions with no binding.
 
-    ||  ( isJoinId fun && dest_join_arity < idJoinArity fun ) -- (J)
+    ||  isJoinId fun    -- (J)
        -- Don't undersaturate join points.
        -- See Note [Invariants on join points] in GHC.Core, and #20599
 
-    || ( dest_arity < idCbvMarkArity fun ) -- (W)
+    || (isJust (idCbvMarks_maybe fun)) -- (W)
        -- Don't undersaturate StrictWorkerIds.
        -- See Note [CBV Function Ids] in GHC.Types.Id.Info.
 


=====================================
compiler/GHC/Runtime/Interpreter.hs
=====================================
@@ -93,7 +93,6 @@ import GHC.Utils.Panic
 import GHC.Utils.Exception as Ex
 import GHC.Utils.Outputable(brackets, ppr, showSDocUnsafe)
 import GHC.Utils.Fingerprint
-import GHC.Utils.Misc
 
 import GHC.Unit.Module
 import GHC.Unit.Module.ModIface
@@ -110,9 +109,7 @@ import Control.Monad
 import Control.Monad.IO.Class
 import Control.Monad.Catch as MC (mask)
 import Data.Binary
-import Data.Binary.Put
 import Data.ByteString (ByteString)
-import qualified Data.ByteString.Lazy as LB
 import Data.Array ((!))
 import Data.IORef
 import Foreign hiding (void)
@@ -120,7 +117,6 @@ import qualified GHC.Exts.Heap as Heap
 import GHC.Stack.CCS (CostCentre,CostCentreStack)
 import System.Directory
 import System.Process
-import GHC.Conc (pseq, par)
 
 {- Note [Remote GHCi]
    ~~~~~~~~~~~~~~~~~~
@@ -353,19 +349,7 @@ mkCostCentres interp mod ccs =
 -- | Create a set of BCOs that may be mutually recursive.
 createBCOs :: Interp -> [ResolvedBCO] -> IO [HValueRef]
 createBCOs interp rbcos = do
-  -- Serializing ResolvedBCO is expensive, so we do it in parallel
-  interpCmd interp (CreateBCOs puts)
- where
-  puts = parMap doChunk (chunkList 100 rbcos)
-
-  -- make sure we force the whole lazy ByteString
-  doChunk c = pseq (LB.length bs) bs
-    where bs = runPut (put c)
-
-  -- We don't have the parallel package, so roll our own simple parMap
-  parMap _ [] = []
-  parMap f (x:xs) = fx `par` (fxs `pseq` (fx : fxs))
-    where fx = f x; fxs = parMap f xs
+  interpCmd interp (CreateBCOs rbcos)
 
 addSptEntry :: Interp -> Fingerprint -> ForeignHValue -> IO ()
 addSptEntry interp fpr ref =


=====================================
compiler/GHC/Utils/Misc.hs
=====================================
@@ -37,8 +37,6 @@ module GHC.Utils.Misc (
         isSingleton, only, expectOnly, GHC.Utils.Misc.singleton,
         notNull, expectNonEmpty, snocView,
 
-        chunkList,
-
         holes,
 
         changeLast,
@@ -494,11 +492,6 @@ expectOnly _   (a:_) = a
 #endif
 expectOnly msg _     = panic ("expectOnly: " ++ msg)
 
--- | Split a list into chunks of /n/ elements
-chunkList :: Int -> [a] -> [[a]]
-chunkList _ [] = []
-chunkList n xs = as : chunkList n bs where (as,bs) = splitAt n xs
-
 -- | Compute all the ways of removing a single element from a list.
 --
 --  > holes [1,2,3] = [(1, [2,3]), (2, [1,3]), (3, [1,2])]


=====================================
libraries/ghci/GHCi/Message.hs
=====================================
@@ -30,11 +30,13 @@ import GHCi.RemoteTypes
 import GHCi.FFI
 import GHCi.TH.Binary () -- For Binary instances
 import GHCi.BreakArray
+import GHCi.ResolvedBCO
 
 import GHC.LanguageExtensions
 import qualified GHC.Exts.Heap as Heap
 import GHC.ForeignSrcLang
 import GHC.Fingerprint
+import GHC.Conc (pseq, par)
 import Control.Concurrent
 import Control.Exception
 import Data.Binary
@@ -84,10 +86,10 @@ data Message a where
   -- Interpreter -------------------------------------------
 
   -- | Create a set of BCO objects, and return HValueRefs to them
-  -- Note: Each ByteString contains a Binary-encoded [ResolvedBCO], not
-  -- a ResolvedBCO. The list is to allow us to serialise the ResolvedBCOs
-  -- in parallel. See @createBCOs@ in compiler/GHC/Runtime/Interpreter.hs.
-  CreateBCOs :: [LB.ByteString] -> Message [HValueRef]
+  -- See @createBCOs@ in compiler/GHC/Runtime/Interpreter.hs.
+  -- NB: this has a custom Binary behavior,
+  -- see Note [Parallelize CreateBCOs serialization]
+  CreateBCOs :: [ResolvedBCO] -> Message [HValueRef]
 
   -- | Release 'HValueRef's
   FreeHValueRefs :: [HValueRef] -> Message ()
@@ -513,7 +515,8 @@ getMessage = do
       9  -> Msg <$> RemoveLibrarySearchPath <$> get
       10 -> Msg <$> return ResolveObjs
       11 -> Msg <$> FindSystemLibrary <$> get
-      12 -> Msg <$> CreateBCOs <$> get
+      12 -> Msg <$> (CreateBCOs . concatMap (runGet get)) <$> (get :: Get [LB.ByteString])
+                    -- See Note [Parallelize CreateBCOs serialization]
       13 -> Msg <$> FreeHValueRefs <$> get
       14 -> Msg <$> MallocData <$> get
       15 -> Msg <$> MallocStrings <$> get
@@ -557,7 +560,8 @@ putMessage m = case m of
   RemoveLibrarySearchPath ptr -> putWord8 9  >> put ptr
   ResolveObjs                 -> putWord8 10
   FindSystemLibrary str       -> putWord8 11 >> put str
-  CreateBCOs bco              -> putWord8 12 >> put bco
+  CreateBCOs bco              -> putWord8 12 >> put (serializeBCOs bco)
+                              -- See Note [Parallelize CreateBCOs serialization]
   FreeHValueRefs val          -> putWord8 13 >> put val
   MallocData bs               -> putWord8 14 >> put bs
   MallocStrings bss           -> putWord8 15 >> put bss
@@ -586,6 +590,34 @@ putMessage m = case m of
   ResumeSeq a                 -> putWord8 38 >> put a
   NewBreakModule name          -> putWord8 39 >> put name
 
+{-
+Note [Parallelize CreateBCOs serialization]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Serializing ResolvedBCO is expensive, so we do it in parallel.
+We split the list [ResolvedBCO] into chunks of length <= 100,
+and serialize every chunk in parallel, getting a [LB.ByteString]
+where every bytestring corresponds to a single chunk (multiple ResolvedBCOs).
+
+Previously, we stored [LB.ByteString] in the Message object, but that
+incurs unneccessary serialization with the internal interpreter (#23919).
+-}
+
+serializeBCOs :: [ResolvedBCO] -> [LB.ByteString]
+serializeBCOs rbcos = parMap doChunk (chunkList 100 rbcos)
+ where
+  -- make sure we force the whole lazy ByteString
+  doChunk c = pseq (LB.length bs) bs
+    where bs = runPut (put c)
+
+  -- We don't have the parallel package, so roll our own simple parMap
+  parMap _ [] = []
+  parMap f (x:xs) = fx `par` (fxs `pseq` (fx : fxs))
+    where fx = f x; fxs = parMap f xs
+
+  chunkList :: Int -> [a] -> [[a]]
+  chunkList _ [] = []
+  chunkList n xs = as : chunkList n bs where (as,bs) = splitAt n xs
+
 -- -----------------------------------------------------------------------------
 -- Reading/writing messages
 


=====================================
libraries/ghci/GHCi/Run.hs
=====================================
@@ -17,8 +17,6 @@ import Prelude -- See note [Why do we import Prelude here?]
 #if !defined(javascript_HOST_ARCH)
 import GHCi.CreateBCO
 import GHCi.InfoTable
-import Data.Binary
-import Data.Binary.Get
 #endif
 
 import GHCi.FFI
@@ -78,7 +76,7 @@ run m = case m of
     toRemotePtr <$> mkConInfoTable tc ptrs nptrs tag ptrtag desc
   ResolveObjs -> resolveObjs
   FindSystemLibrary str -> findSystemLibrary str
-  CreateBCOs bcos -> createBCOs (concatMap (runGet get) bcos)
+  CreateBCOs bcos -> createBCOs bcos
   LookupClosure str -> lookupClosure str
 #endif
   RtsRevertCAFs -> rts_revertCAFs


=====================================
libraries/ghci/GHCi/TH.hs
=====================================
@@ -38,7 +38,7 @@ For each splice
 1. GHC compiles a splice to byte code, and sends it to the server: in
    a CreateBCOs message:
 
-   CreateBCOs :: [LB.ByteString] -> Message [HValueRef]
+   CreateBCOs :: [ResolvedBCOs] -> Message [HValueRef]
 
 2. The server creates the real byte-code objects in its heap, and
    returns HValueRefs to GHC.  HValueRef is the same as RemoteRef


=====================================
m4/fp_find_libnuma.m4
=====================================
@@ -30,7 +30,7 @@ AC_DEFUN([FP_FIND_LIBNUMA],
           [Enable NUMA memory policy and thread affinity support in the
            runtime system via numactl's libnuma [default=auto]])])
 
-  if test "$enable_numa" = "yes" ; then
+  if test "$enable_numa" != "no" ; then
     CFLAGS2="$CFLAGS"
     CFLAGS="$LIBNUMA_CFLAGS $CFLAGS"
     LDFLAGS2="$LDFLAGS"
@@ -41,7 +41,7 @@ AC_DEFUN([FP_FIND_LIBNUMA],
     if test "$ac_cv_header_numa_h$ac_cv_header_numaif_h" = "yesyes" ; then
       AC_CHECK_LIB(numa, numa_available,HaveLibNuma=1)
     fi
-    if test "$HaveLibNuma" = "0" ; then
+    if test "$enable_numa:$HaveLibNuma" = "yes:0" ; then
         AC_MSG_ERROR([Cannot find system libnuma (required by --enable-numa)])
     fi
 


=====================================
testsuite/tests/perf/should_run/T15426.hs
=====================================
@@ -1,3 +1,8 @@
+{-# OPTIONS_GHC -fno-cse #-}
+    -- Avoid depending on flukey CSE; there are really 5 independent
+    -- tests in this module, and we don't want them to interact.
+    -- See #23925
+
 import Control.Exception (evaluate)
 import qualified Data.List as L
 
@@ -28,4 +33,4 @@ As a result these lists are now floated out and shared.
 
 Just leaving breadcrumbs, in case we later see big perf changes on
 this (slightly fragile) benchmark.
--}
\ No newline at end of file
+-}


=====================================
testsuite/tests/perf/should_run/T18964.hs
=====================================
@@ -1,3 +1,8 @@
+{-# OPTIONS_GHC -fno-cse #-}
+    -- Avoid depending on flukey CSE; there are really 4 independent
+    -- tests in this module, and we don't want them to interact.
+    -- See #23925
+
 import GHC.Exts
 import Data.Int
 



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/6f969e06823befd50e7cb7c06123a180dc0e4a73...9ad5ead064fbe99e60e65e07170785e1e4ee5e14

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/6f969e06823befd50e7cb7c06123a180dc0e4a73...9ad5ead064fbe99e60e65e07170785e1e4ee5e14
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230913/f6a7a83d/attachment-0001.html>