[Git][ghc/ghc][wip/marge_bot_batch_merge_job] 4 commits: Avoid serializing BCOs with the internal interpreter
Marge Bot (@marge-bot)
gitlab at gitlab.haskell.org
Wed Sep 13 09:36:04 UTC 2023
Marge Bot pushed to branch wip/marge_bot_batch_merge_job at Glasgow Haskell Compiler / GHC
Commits:
dfc4f426 by Krzysztof Gogolewski at 2023-09-12T20:31:35-04:00
Avoid serializing BCOs with the internal interpreter
Refs #23919
- - - - -
d12dd069 by Finley McIlwaine at 2023-09-13T05:35:58-04:00
Fix numa auto configure
- - - - -
4e966157 by Simon Peyton Jones at 2023-09-13T05:35:58-04:00
Add -fno-cse to T15426 and T18964
This -fno-cse change is to avoid these performance tests depending on
flukey CSE stuff. Each contains several independent tests, and we don't
want them to interact.
See #23925.
By killing CSE we expect a 400% increase in T15426, and 100% in T18964.
Metric Increase:
T15426
T18964
- - - - -
9ad5ead0 by Simon Peyton Jones at 2023-09-13T05:35:58-04:00
Tiny refactor
canEtaReduceToArity was only called internally, and always with
two arguments equal to zero. This patch just specialises the
function, and renames it to cantEtaReduceFun.
No change in behaviour.
- - - - -
9 changed files:
- compiler/GHC/Core/Opt/Arity.hs
- compiler/GHC/Runtime/Interpreter.hs
- compiler/GHC/Utils/Misc.hs
- libraries/ghci/GHCi/Message.hs
- libraries/ghci/GHCi/Run.hs
- libraries/ghci/GHCi/TH.hs
- m4/fp_find_libnuma.m4
- testsuite/tests/perf/should_run/T15426.hs
- testsuite/tests/perf/should_run/T18964.hs
Changes:
=====================================
compiler/GHC/Core/Opt/Arity.hs
=====================================
@@ -87,6 +87,8 @@ import GHC.Utils.Outputable
import GHC.Utils.Panic
import GHC.Utils.Misc
+import Data.Maybe( isJust )
+
{-
************************************************************************
* *
@@ -2376,7 +2378,7 @@ perform eta reduction on an expression with n leading lambdas `\xs. e xs`
(checked in 'is_eta_reduction_sound' in 'tryEtaReduce', which focuses on the
case where `e` is trivial):
- A. It is sound to eta-reduce n arguments as long as n does not exceed the
+(A) It is sound to eta-reduce n arguments as long as n does not exceed the
`exprArity` of `e`. (Needs Arity analysis.)
This criterion exploits information about how `e` is *defined*.
@@ -2385,7 +2387,7 @@ case where `e` is trivial):
By contrast, it would be *unsound* to eta-reduce 2 args, `\x y. e x y` to `e`:
`e 42` diverges when `(\x y. e x y) 42` does not.
- S. It is sound to eta-reduce n arguments in an evaluation context in which all
+(S) It is sound to eta-reduce n arguments in an evaluation context in which all
calls happen with at least n arguments. (Needs Strictness analysis.)
NB: This treats evaluations like a call with 0 args.
NB: This criterion exploits information about how `e` is *used*.
@@ -2412,13 +2414,13 @@ case where `e` is trivial):
See Note [Eta reduction based on evaluation context] for the implementation
details. This criterion is tested extensively in T21261.
- R. Note [Eta reduction in recursive RHSs] tells us that we should not
+(R) Note [Eta reduction in recursive RHSs] tells us that we should not
eta-reduce `f` in its own RHS and describes our fix.
There we have `f = \x. f x` and we should not eta-reduce to `f=f`. Which
might change a terminating program (think @f `seq` e@) to a non-terminating
one.
- E. (See fun_arity in tryEtaReduce.) As a perhaps special case on the
+(E) (See fun_arity in tryEtaReduce.) As a perhaps special case on the
boundary of (A) and (S), when we know that a fun binder `f` is in
WHNF, we simply assume it has arity 1 and apply (A). Example:
g f = f `seq` \x. f x
@@ -2428,7 +2430,7 @@ case where `e` is trivial):
And here are a few more technical criteria for when it is *not* sound to
eta-reduce that are specific to Core and GHC:
- L. With linear types, eta-reduction can break type-checking:
+(L) With linear types, eta-reduction can break type-checking:
f :: A ⊸ B
g :: A -> B
g = \x. f x
@@ -2436,13 +2438,13 @@ eta-reduce that are specific to Core and GHC:
complain that g and f don't have the same type. NB: Not unsound in the
dynamic semantics, but unsound according to the static semantics of Core.
- J. We may not undersaturate join points.
+(J) We may not undersaturate join points.
See Note [Invariants on join points] in GHC.Core, and #20599.
- B. We may not undersaturate functions with no binding.
+(B) We may not undersaturate functions with no binding.
See Note [Eta expanding primops].
- W. We may not undersaturate StrictWorkerIds.
+(W) We may not undersaturate StrictWorkerIds.
See Note [CBV Function Ids] in GHC.Types.Id.Info.
Here is a list of historic accidents surrounding unsound eta-reduction:
@@ -2699,7 +2701,7 @@ tryEtaReduce rec_ids bndrs body eval_sd
|| all_calls_with_arity incoming_arity) -- criterion (S)
-- ... and that the function can be eta reduced to arity 0
-- without violating invariants of Core and GHC
- && canEtaReduceToArity fun 0 0 -- criteria (L), (J), (W), (B)
+ && not (cantEtaReduceFun fun) -- criteria (L), (J), (W), (B)
all_calls_with_arity n = isStrict (fst $ peelManyCalls n eval_sd)
-- See Note [Eta reduction based on evaluation context]
@@ -2754,19 +2756,18 @@ tryEtaReduce rec_ids bndrs body eval_sd
ok_arg _ _ _ _ = Nothing
--- | Can we eta-reduce the given function to the specified arity?
+-- | Can we eta-reduce the given function
-- See Note [Eta reduction soundness], criteria (B), (J), (W) and (L).
-canEtaReduceToArity :: Id -> JoinArity -> Arity -> Bool
-canEtaReduceToArity fun dest_join_arity dest_arity =
- not $
- hasNoBinding fun -- (B)
+cantEtaReduceFun :: Id -> Bool
+cantEtaReduceFun fun
+ = hasNoBinding fun -- (B)
-- Don't undersaturate functions with no binding.
- || ( isJoinId fun && dest_join_arity < idJoinArity fun ) -- (J)
+ || isJoinId fun -- (J)
-- Don't undersaturate join points.
-- See Note [Invariants on join points] in GHC.Core, and #20599
- || ( dest_arity < idCbvMarkArity fun ) -- (W)
+ || (isJust (idCbvMarks_maybe fun)) -- (W)
-- Don't undersaturate StrictWorkerIds.
-- See Note [CBV Function Ids] in GHC.Types.Id.Info.
=====================================
compiler/GHC/Runtime/Interpreter.hs
=====================================
@@ -93,7 +93,6 @@ import GHC.Utils.Panic
import GHC.Utils.Exception as Ex
import GHC.Utils.Outputable(brackets, ppr, showSDocUnsafe)
import GHC.Utils.Fingerprint
-import GHC.Utils.Misc
import GHC.Unit.Module
import GHC.Unit.Module.ModIface
@@ -110,9 +109,7 @@ import Control.Monad
import Control.Monad.IO.Class
import Control.Monad.Catch as MC (mask)
import Data.Binary
-import Data.Binary.Put
import Data.ByteString (ByteString)
-import qualified Data.ByteString.Lazy as LB
import Data.Array ((!))
import Data.IORef
import Foreign hiding (void)
@@ -120,7 +117,6 @@ import qualified GHC.Exts.Heap as Heap
import GHC.Stack.CCS (CostCentre,CostCentreStack)
import System.Directory
import System.Process
-import GHC.Conc (pseq, par)
{- Note [Remote GHCi]
~~~~~~~~~~~~~~~~~~
@@ -353,19 +349,7 @@ mkCostCentres interp mod ccs =
-- | Create a set of BCOs that may be mutually recursive.
createBCOs :: Interp -> [ResolvedBCO] -> IO [HValueRef]
createBCOs interp rbcos = do
- -- Serializing ResolvedBCO is expensive, so we do it in parallel
- interpCmd interp (CreateBCOs puts)
- where
- puts = parMap doChunk (chunkList 100 rbcos)
-
- -- make sure we force the whole lazy ByteString
- doChunk c = pseq (LB.length bs) bs
- where bs = runPut (put c)
-
- -- We don't have the parallel package, so roll our own simple parMap
- parMap _ [] = []
- parMap f (x:xs) = fx `par` (fxs `pseq` (fx : fxs))
- where fx = f x; fxs = parMap f xs
+ interpCmd interp (CreateBCOs rbcos)
addSptEntry :: Interp -> Fingerprint -> ForeignHValue -> IO ()
addSptEntry interp fpr ref =
=====================================
compiler/GHC/Utils/Misc.hs
=====================================
@@ -37,8 +37,6 @@ module GHC.Utils.Misc (
isSingleton, only, expectOnly, GHC.Utils.Misc.singleton,
notNull, expectNonEmpty, snocView,
- chunkList,
-
holes,
changeLast,
@@ -494,11 +492,6 @@ expectOnly _ (a:_) = a
#endif
expectOnly msg _ = panic ("expectOnly: " ++ msg)
--- | Split a list into chunks of /n/ elements
-chunkList :: Int -> [a] -> [[a]]
-chunkList _ [] = []
-chunkList n xs = as : chunkList n bs where (as,bs) = splitAt n xs
-
-- | Compute all the ways of removing a single element from a list.
--
-- > holes [1,2,3] = [(1, [2,3]), (2, [1,3]), (3, [1,2])]
=====================================
libraries/ghci/GHCi/Message.hs
=====================================
@@ -30,11 +30,13 @@ import GHCi.RemoteTypes
import GHCi.FFI
import GHCi.TH.Binary () -- For Binary instances
import GHCi.BreakArray
+import GHCi.ResolvedBCO
import GHC.LanguageExtensions
import qualified GHC.Exts.Heap as Heap
import GHC.ForeignSrcLang
import GHC.Fingerprint
+import GHC.Conc (pseq, par)
import Control.Concurrent
import Control.Exception
import Data.Binary
@@ -84,10 +86,10 @@ data Message a where
-- Interpreter -------------------------------------------
-- | Create a set of BCO objects, and return HValueRefs to them
- -- Note: Each ByteString contains a Binary-encoded [ResolvedBCO], not
- -- a ResolvedBCO. The list is to allow us to serialise the ResolvedBCOs
- -- in parallel. See @createBCOs@ in compiler/GHC/Runtime/Interpreter.hs.
- CreateBCOs :: [LB.ByteString] -> Message [HValueRef]
+ -- See @createBCOs@ in compiler/GHC/Runtime/Interpreter.hs.
+ -- NB: this has a custom Binary behavior,
+ -- see Note [Parallelize CreateBCOs serialization]
+ CreateBCOs :: [ResolvedBCO] -> Message [HValueRef]
-- | Release 'HValueRef's
FreeHValueRefs :: [HValueRef] -> Message ()
@@ -513,7 +515,8 @@ getMessage = do
9 -> Msg <$> RemoveLibrarySearchPath <$> get
10 -> Msg <$> return ResolveObjs
11 -> Msg <$> FindSystemLibrary <$> get
- 12 -> Msg <$> CreateBCOs <$> get
+ 12 -> Msg <$> (CreateBCOs . concatMap (runGet get)) <$> (get :: Get [LB.ByteString])
+ -- See Note [Parallelize CreateBCOs serialization]
13 -> Msg <$> FreeHValueRefs <$> get
14 -> Msg <$> MallocData <$> get
15 -> Msg <$> MallocStrings <$> get
@@ -557,7 +560,8 @@ putMessage m = case m of
RemoveLibrarySearchPath ptr -> putWord8 9 >> put ptr
ResolveObjs -> putWord8 10
FindSystemLibrary str -> putWord8 11 >> put str
- CreateBCOs bco -> putWord8 12 >> put bco
+ CreateBCOs bco -> putWord8 12 >> put (serializeBCOs bco)
+ -- See Note [Parallelize CreateBCOs serialization]
FreeHValueRefs val -> putWord8 13 >> put val
MallocData bs -> putWord8 14 >> put bs
MallocStrings bss -> putWord8 15 >> put bss
@@ -586,6 +590,34 @@ putMessage m = case m of
ResumeSeq a -> putWord8 38 >> put a
NewBreakModule name -> putWord8 39 >> put name
+{-
+Note [Parallelize CreateBCOs serialization]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Serializing ResolvedBCO is expensive, so we do it in parallel.
+We split the list [ResolvedBCO] into chunks of length <= 100,
+and serialize every chunk in parallel, getting a [LB.ByteString]
+where every bytestring corresponds to a single chunk (multiple ResolvedBCOs).
+
+Previously, we stored [LB.ByteString] in the Message object, but that
+incurs unneccessary serialization with the internal interpreter (#23919).
+-}
+
+serializeBCOs :: [ResolvedBCO] -> [LB.ByteString]
+serializeBCOs rbcos = parMap doChunk (chunkList 100 rbcos)
+ where
+ -- make sure we force the whole lazy ByteString
+ doChunk c = pseq (LB.length bs) bs
+ where bs = runPut (put c)
+
+ -- We don't have the parallel package, so roll our own simple parMap
+ parMap _ [] = []
+ parMap f (x:xs) = fx `par` (fxs `pseq` (fx : fxs))
+ where fx = f x; fxs = parMap f xs
+
+ chunkList :: Int -> [a] -> [[a]]
+ chunkList _ [] = []
+ chunkList n xs = as : chunkList n bs where (as,bs) = splitAt n xs
+
-- -----------------------------------------------------------------------------
-- Reading/writing messages
=====================================
libraries/ghci/GHCi/Run.hs
=====================================
@@ -17,8 +17,6 @@ import Prelude -- See note [Why do we import Prelude here?]
#if !defined(javascript_HOST_ARCH)
import GHCi.CreateBCO
import GHCi.InfoTable
-import Data.Binary
-import Data.Binary.Get
#endif
import GHCi.FFI
@@ -78,7 +76,7 @@ run m = case m of
toRemotePtr <$> mkConInfoTable tc ptrs nptrs tag ptrtag desc
ResolveObjs -> resolveObjs
FindSystemLibrary str -> findSystemLibrary str
- CreateBCOs bcos -> createBCOs (concatMap (runGet get) bcos)
+ CreateBCOs bcos -> createBCOs bcos
LookupClosure str -> lookupClosure str
#endif
RtsRevertCAFs -> rts_revertCAFs
=====================================
libraries/ghci/GHCi/TH.hs
=====================================
@@ -38,7 +38,7 @@ For each splice
1. GHC compiles a splice to byte code, and sends it to the server: in
a CreateBCOs message:
- CreateBCOs :: [LB.ByteString] -> Message [HValueRef]
+ CreateBCOs :: [ResolvedBCOs] -> Message [HValueRef]
2. The server creates the real byte-code objects in its heap, and
returns HValueRefs to GHC. HValueRef is the same as RemoteRef
=====================================
m4/fp_find_libnuma.m4
=====================================
@@ -30,7 +30,7 @@ AC_DEFUN([FP_FIND_LIBNUMA],
[Enable NUMA memory policy and thread affinity support in the
runtime system via numactl's libnuma [default=auto]])])
- if test "$enable_numa" = "yes" ; then
+ if test "$enable_numa" != "no" ; then
CFLAGS2="$CFLAGS"
CFLAGS="$LIBNUMA_CFLAGS $CFLAGS"
LDFLAGS2="$LDFLAGS"
@@ -41,7 +41,7 @@ AC_DEFUN([FP_FIND_LIBNUMA],
if test "$ac_cv_header_numa_h$ac_cv_header_numaif_h" = "yesyes" ; then
AC_CHECK_LIB(numa, numa_available,HaveLibNuma=1)
fi
- if test "$HaveLibNuma" = "0" ; then
+ if test "$enable_numa:$HaveLibNuma" = "yes:0" ; then
AC_MSG_ERROR([Cannot find system libnuma (required by --enable-numa)])
fi
=====================================
testsuite/tests/perf/should_run/T15426.hs
=====================================
@@ -1,3 +1,8 @@
+{-# OPTIONS_GHC -fno-cse #-}
+ -- Avoid depending on flukey CSE; there are really 5 independent
+ -- tests in this module, and we don't want them to interact.
+ -- See #23925
+
import Control.Exception (evaluate)
import qualified Data.List as L
@@ -28,4 +33,4 @@ As a result these lists are now floated out and shared.
Just leaving breadcrumbs, in case we later see big perf changes on
this (slightly fragile) benchmark.
--}
\ No newline at end of file
+-}
=====================================
testsuite/tests/perf/should_run/T18964.hs
=====================================
@@ -1,3 +1,8 @@
+{-# OPTIONS_GHC -fno-cse #-}
+ -- Avoid depending on flukey CSE; there are really 4 independent
+ -- tests in this module, and we don't want them to interact.
+ -- See #23925
+
import GHC.Exts
import Data.Int
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/6f969e06823befd50e7cb7c06123a180dc0e4a73...9ad5ead064fbe99e60e65e07170785e1e4ee5e14
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/6f969e06823befd50e7cb7c06123a180dc0e4a73...9ad5ead064fbe99e60e65e07170785e1e4ee5e14
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230913/f6a7a83d/attachment-0001.html>
More information about the ghc-commits
mailing list