[Git][ghc/ghc][wip/specialise-assembler] 10 commits: compiler: Always load GHC.Data.FastString optimised into GHCi

Matthew Pickering (@mpickering) gitlab at gitlab.haskell.org
Wed Feb 5 13:33:19 UTC 2025



Matthew Pickering pushed to branch wip/specialise-assembler at Glasgow Haskell Compiler / GHC


Commits:
662645f0 by Matthew Pickering at 2025-02-03T11:44:41-05:00
compiler: Always load GHC.Data.FastString optimised into GHCi

The FastString table is shared between the boot compiler and interpreted
compiler. Therefore it's very important the representation of
`FastString` matches in both cases. Otherwise, the interpreter will read
a FastString from the shared variable but place the fields in the wrong
place which leads to segfaults.

Ideally this state would not be shared, but for now we can always
compile both with `-O2` and this leads to a working interpreter.

- - - - -
05e5785a by Peter Trommler at 2025-02-03T11:45:17-05:00
RTS: Fix compile on powerpc64 ELF v1

Cabal does not know about the different ABIs for powerpc64 and compiles
StgCRunAsm.S unconditionally. The old make-based build system excluded
this file from the build and it was OK to signal an error when it was
compiled accidentally.

With this patch we compile StgCRunAsm.S to an empty file, which fixes
the build.

Fixes #25700

- - - - -
cbbb64fb by Matthew Pickering at 2025-02-03T23:40:33-05:00
interpreter: Always print unit and module name in BCO_NAME instruction

Currently the BCO_Name instruction is a bit difficult to use since the
names are not qualified by the module they come from. When you have a
very generic name such as "wildX4", it becomes impossible to work out
which module the identifier comes from.

Fixes #25694

- - - - -
764a43ac by Ben Gamari at 2025-02-03T23:41:10-05:00
upload-ghc-libs: Drop more references to ghc-internal from ghc-boot-th


(cherry picked from commit afec4b75c2d0e9f5c462a86d9f3697acf30355c7)

Co-authored-by: Ben Gamari <bgamari.foss at gmail.com>
- - - - -
9a59b026 by Ben Gamari at 2025-02-04T10:00:18-05:00
gitlab-ci: Don't use .full-ci to run test-primops

test-primops depends upon the existence of validate jobs, yet these do
not exist in the context of nightly jobs, which .full-ci includes.

- - - - -
7cc08550 by Ben Gamari at 2025-02-04T18:34:49-05:00
CorePrep: Name `sat` binders more descriptively

- - - - -
fb40981d by Ben Gamari at 2025-02-04T18:35:26-05:00
ghc-toolchain: Parse i686 triples

This is a moniker used for later 32-bit x86 implementations
(Pentium Pro and later).

Fixes #25691.

- - - - -
02794411 by Cheng Shao at 2025-02-04T18:36:03-05:00
compiler: remove unused assembleOneBCO function

This patch removes the unused assembleOneBCO function from the
bytecode assembler.

- - - - -
b4d9bcd8 by Matthew Pickering at 2025-02-05T13:27:19+00:00
perf: Replace uses of genericLength with strictGenericLength

genericLength is a recursive function and marked NOINLINE. It is not
going to specialise. In profiles, it can be seen that 3% of total compilation
time when computing bytecode is spend calling this non-specialised
function.

In addition, we can simplify `addListToSS` to avoid traversing the input
list twice and also allocating an intermediate list (after the call to
reverse).

Overall these changes reduce the time spend in 'assembleBCOs' from 5.61s
to 3.88s. Allocations drop from 8GB to 5.3G.

Fixes #25706

- - - - -
890bec99 by Matthew Pickering at 2025-02-05T13:32:48+00:00
perf: Speed up the bytecode assembler

This commit contains a number of optimisations to the bytecode
assembler. In programs which generate a large amount of bytecode, the
assembler is called a lot of times on many instructions.

1. Specialise the assembleI function for the two intepreters to avoid
having to materialise the intermediate free-monad like structure.
2. Directly compute the UArray and SmallArray needed rather than going
   via the intermediate SizedSeq
3. Use optimised monads
4. Define unrolled "any" and "mapM6" functions which can be inlined
   and avoid calling recursive functions.

The resulting generated code is much more direct.

- - - - -


23 changed files:

- .gitlab-ci.yml
- .gitlab/rel_eng/upload_ghc_libs.py
- compiler/GHC/ByteCode/Asm.hs
- compiler/GHC/ByteCode/Instr.hs
- compiler/GHC/ByteCode/Types.hs
- compiler/GHC/CoreToStg/Prep.hs
- compiler/GHC/Data/FastString.hs
- compiler/GHC/Data/FlatBag.hs
- compiler/GHC/Data/SmallArray.hs
- compiler/GHC/Prelude/Basic.hs
- compiler/GHC/StgToByteCode.hs
- libraries/ghc-boot/GHC/Data/SizedSeq.hs
- rts/StgCRunAsm.S
- testsuite/tests/core-to-stg/T14895.stderr
- testsuite/tests/core-to-stg/T24124.stderr
- testsuite/tests/ghci/should_run/T21052.stdout
- + testsuite/tests/perf/should_run/ByteCodeAsm.hs
- testsuite/tests/perf/should_run/all.T
- testsuite/tests/simplCore/should_compile/T20040.stderr
- testsuite/tests/simplCore/should_compile/T23083.stderr
- testsuite/tests/simplStg/should_compile/T15226b.stderr
- testsuite/tests/simplStg/should_compile/T19717.stderr
- utils/ghc-toolchain/src/GHC/Toolchain/ParseTriple.hs


Changes:

=====================================
.gitlab-ci.yml
=====================================
@@ -909,7 +909,10 @@ test-primops-label:
   extends: .test-primops-validate-template
   rules:
     - if: '$CI_MERGE_REQUEST_LABELS =~ /.*test-primops.*/'
-    - *full-ci
+      # We do not use *.full-ci here since that would imply running in nightly
+      # where we do not have the normal validate jobs. We have the -nightly job
+      # below to handle this case.
+    - if: '$CI_MERGE_REQUEST_LABELS =~ /.*full-ci.*/'
 
 test-primops-nightly:
   extends: .test-primops


=====================================
.gitlab/rel_eng/upload_ghc_libs.py
=====================================
@@ -94,9 +94,23 @@ def prep_ghc():
     build_copy_file(PACKAGES['ghc'], 'GHC/Settings/Config.hs')
 
 def prep_ghc_boot_th():
-    # Drop ghc-internal from `hs-source-dirs` as Hackage rejects this
+    # Drop references to `ghc-internal` from `hs-source-dirs` as Hackage rejects
+    # out-of-sdist references and this packages is only uploaded for documentation
+    # purposes.
     modify_file(PACKAGES['ghc-boot-th'], 'ghc-boot-th.cabal',
-                lambda s: s.replace('../ghc-internal/src', ''))
+                lambda s: s.replace('../ghc-internal/src', '')
+                           .replace('GHC.Internal.TH.Lib.Map', '')
+                           .replace('GHC.Internal.TH.PprLib', '')
+                           .replace('GHC.Internal.TH.Ppr', '')
+                           .replace('GHC.Internal.TH.Lib,', '')
+                           .replace('GHC.Internal.TH.Lib', '')
+                           .replace('GHC.Internal.TH.Lift,', '')
+                           .replace('GHC.Internal.TH.Quote,', '')
+                           .replace('GHC.Internal.TH.Syntax', '')
+                           .replace('GHC.Internal.ForeignSrcLang', '')
+                           .replace('GHC.Internal.LanguageExtensions', '')
+                           .replace('GHC.Internal.Lexeme', '')
+                )
 
 PACKAGES = {
     pkg.name: pkg


=====================================
compiler/GHC/ByteCode/Asm.hs
=====================================
@@ -1,21 +1,29 @@
 {-# LANGUAGE CPP             #-}
 {-# LANGUAGE DeriveFunctor   #-}
 {-# LANGUAGE RecordWildCards #-}
+{-# LANGUAGE MagicHash       #-}
+{-# LANGUAGE UnboxedTuples   #-}
+{-# LANGUAGE PatternSynonyms   #-}
 {-# OPTIONS_GHC -optc-DNON_POSIX_SOURCE #-}
 --
+--
 --  (c) The University of Glasgow 2002-2006
 --
 
 -- | Bytecode assembler and linker
 module GHC.ByteCode.Asm (
-        assembleBCOs, assembleOneBCO,
+        assembleBCOs,
         bcoFreeNames,
         SizedSeq, sizeSS, ssElts,
         iNTERP_STACK_CHECK_THRESH,
-        mkNativeCallInfoLit
+        mkNativeCallInfoLit,
+
+        -- * For testing
+        assembleBCO
   ) where
 
-import GHC.Prelude
+import GHC.Prelude hiding ( any )
+
 
 import GHC.ByteCode.Instr
 import GHC.ByteCode.InfoTable
@@ -29,13 +37,14 @@ import GHC.Types.Name.Set
 import GHC.Types.Literal
 import GHC.Types.Unique.DSet
 import GHC.Types.SptEntry
+import GHC.Types.Unique.FM
 
 import GHC.Utils.Outputable
 import GHC.Utils.Panic
 
 import GHC.Core.TyCon
-import GHC.Data.FlatBag
 import GHC.Data.SizedSeq
+import GHC.Data.SmallArray
 
 import GHC.StgToCmm.Layout     ( ArgRep(..) )
 import GHC.Cmm.Expr
@@ -45,20 +54,21 @@ import GHC.Platform
 import GHC.Platform.Profile
 
 import Control.Monad
-import Control.Monad.Trans.Class
-import Control.Monad.Trans.State.Strict
+import qualified Control.Monad.Trans.State.Strict as MTL
 
 import qualified Data.Array.Unboxed as Array
-import Data.Array.Base  ( UArray(..) )
+import qualified Data.Array.IO as Array
+import Data.Array.Base  ( UArray(..), unsafeWrite, numElements, unsafeFreeze )
 
 import Foreign hiding (shiftL, shiftR)
-import Data.Char        ( ord )
-import Data.List        ( genericLength )
-import Data.Map.Strict (Map)
+import Data.Char  (ord)
 import Data.Maybe (fromMaybe)
-import qualified Data.Map.Strict as Map
 import GHC.Float (castFloatToWord32, castDoubleToWord64)
 
+import qualified Data.List as List ( any )
+import GHC.Exts
+
+
 -- -----------------------------------------------------------------------------
 -- Unlinked BCOs
 
@@ -135,9 +145,9 @@ assembleBCOs interp profile proto_bcos tycons top_strs modbreaks spt_entries = d
 --
 mallocStrings ::  Interp -> FlatBag UnlinkedBCO -> IO (FlatBag UnlinkedBCO)
 mallocStrings interp ulbcos = do
-  let bytestrings = reverse (execState (mapM_ collect ulbcos) [])
+  let bytestrings = reverse (MTL.execState (mapM_ collect ulbcos) [])
   ptrs <- interpCmd interp (MallocStrings bytestrings)
-  return (evalState (mapM splice ulbcos) ptrs)
+  return (MTL.evalState (mapM splice ulbcos) ptrs)
  where
   splice bco at UnlinkedBCO{..} = do
     lits <- mapM spliceLit unlinkedBCOLits
@@ -145,10 +155,10 @@ mallocStrings interp ulbcos = do
     return bco { unlinkedBCOLits = lits, unlinkedBCOPtrs = ptrs }
 
   spliceLit (BCONPtrStr _) = do
-    rptrs <- get
+    rptrs <- MTL.get
     case rptrs of
       (RemotePtr p : rest) -> do
-        put rest
+        MTL.put rest
         return (BCONPtrWord (fromIntegral p))
       _ -> panic "mallocStrings:spliceLit"
   spliceLit other = return other
@@ -161,21 +171,60 @@ mallocStrings interp ulbcos = do
     mapM_ collectPtr unlinkedBCOPtrs
 
   collectLit (BCONPtrStr bs) = do
-    strs <- get
-    put (bs:strs)
+    strs <- MTL.get
+    MTL.put (bs:strs)
   collectLit _ = return ()
 
   collectPtr (BCOPtrBCO bco) = collect bco
   collectPtr _ = return ()
 
+data RunAsmReader = RunAsmReader { isn_array :: {-# UNPACK #-} !(Array.IOUArray Int Word16)
+                                  , ptr_array :: {-# UNPACK #-} !(SmallMutableArrayIO BCOPtr)
+                                  , lit_array :: {-# UNPACK #-} !(SmallMutableArrayIO BCONPtr )
+                                  }
+
+data RunAsmResult = RunAsmResult { final_isn_array :: !(Array.UArray Int Word16)
+                                 , final_ptr_array :: !(SmallArray BCOPtr)
+                                 , final_lit_array :: !(SmallArray BCONPtr) }
+
+-- How many words we have written so far.
+data AsmState = AsmState { nisn :: !Int, nptr :: !Int, nlit :: !Int }
+
+
+{-# NOINLINE inspectInstrs #-}
+-- | Perform analysis of the bytecode to determine
+--  1. How many instructions we will produce
+--  2. If we are going to need long jumps.
+--  3. The offsets that labels refer to
+inspectInstrs :: Platform -> Bool -> Word -> [BCInstr] -> InspectState
+inspectInstrs platform long_jump e instrs =
+  inspectAsm long_jump e (mapM_ (assembleInspectAsm platform) instrs)
+
+{-# NOINLINE runInstrs #-}
+-- | Assemble the bytecode from the instructions.
+runInstrs ::  Platform -> Bool -> InspectState -> [BCInstr] -> IO RunAsmResult
+runInstrs platform long_jumps is_state instrs = do
+  -- Produce arrays of exactly the right size, corresponding to the result of inspectInstrs.
+  isn_array <- Array.newArray_ (0, (fromIntegral $ instrCount is_state) - 1)
+  ptr_array <- newSmallArrayIO (fromIntegral $ ptrCount is_state) undefined
+  lit_array <- newSmallArrayIO (fromIntegral $ litCount is_state) undefined
+  let env :: LocalLabel -> Word
+      env lbl = fromMaybe
+        (pprPanic "assembleBCO.findLabel" (ppr lbl))
+        (lookupUFM (lblEnv is_state) lbl)
+  let initial_state  = AsmState 0 0 0
+  let initial_reader = RunAsmReader{..}
+  runAsm long_jumps env initial_reader initial_state (mapM_ (\i -> assembleRunAsm platform i) instrs)
+  final_isn_array <- unsafeFreeze isn_array
+  final_ptr_array <- unsafeFreezeSmallArrayIO ptr_array
+  final_lit_array <- unsafeFreezeSmallArrayIO lit_array
+  return $ RunAsmResult {..}
+
+assembleRunAsm :: Platform -> BCInstr -> RunAsm ()
+assembleRunAsm p i = assembleI @RunAsm p i
 
-assembleOneBCO :: Interp -> Profile -> ProtoBCO Name -> IO UnlinkedBCO
-assembleOneBCO interp profile pbco = do
-  -- TODO: the profile should be bundled with the interpreter: the rts ways are
-  -- fixed for an interpreter
-  ubco <- assembleBCO (profilePlatform profile) pbco
-  UnitFlatBag ubco' <- mallocStrings interp (UnitFlatBag ubco)
-  return ubco'
+assembleInspectAsm :: Platform -> BCInstr -> InspectAsm ()
+assembleInspectAsm p i = assembleI @InspectAsm p i
 
 assembleBCO :: Platform -> ProtoBCO Name -> IO UnlinkedBCO
 assembleBCO platform
@@ -185,9 +234,7 @@ assembleBCO platform
                       , protoBCOBitmapSize = bsize
                       , protoBCOArity      = arity }) = do
   -- pass 1: collect up the offsets of the local labels.
-  let asm = mapM_ (assembleI platform) instrs
-
-      initial_offset = 0
+  let initial_offset = 0
 
       -- Jump instructions are variable-sized, there are long and short variants
       -- depending on the magnitude of the offset.  However, we can't tell what
@@ -197,30 +244,25 @@ assembleBCO platform
       -- and if the final size is indeed small enough for short jumps, we are
       -- done.  Otherwise, we repeat the calculation, and we force all jumps in
       -- this BCO to be long.
-      (n_insns0, lbl_map0) = inspectAsm platform False initial_offset asm
-      ((n_insns, lbl_map), long_jumps)
-        | isLargeW (fromIntegral $ Map.size lbl_map0)
-          || isLargeW n_insns0
-                    = (inspectAsm platform True initial_offset asm, True)
-        | otherwise = ((n_insns0, lbl_map0), False)
-
-      env :: LocalLabel -> Word
-      env lbl = fromMaybe
-        (pprPanic "assembleBCO.findLabel" (ppr lbl))
-        (Map.lookup lbl lbl_map)
+      is0 = inspectInstrs platform False initial_offset instrs
+      (is1, long_jumps)
+        | isLargeInspectState is0
+                    = (inspectInstrs platform True initial_offset instrs, True)
+        | otherwise = (is0, False)
+
 
   -- pass 2: run assembler and generate instructions, literals and pointers
-  let initial_state = (emptySS, emptySS, emptySS)
-  (final_insns, final_lits, final_ptrs) <- flip execStateT initial_state $ runAsm platform long_jumps env asm
+  RunAsmResult{..} <- runInstrs platform long_jumps is1 instrs
 
   -- precomputed size should be equal to final size
-  massertPpr (n_insns == sizeSS final_insns)
+  massertPpr (fromIntegral (instrCount is1) == numElements final_isn_array
+              && fromIntegral (ptrCount is1) == sizeofSmallArray final_ptr_array
+              && fromIntegral (litCount is1) == sizeofSmallArray final_lit_array)
              (text "bytecode instruction count mismatch")
 
-  let asm_insns = ssElts final_insns
-      !insns_arr =  mkBCOByteArray $ Array.listArray (0 :: Int, fromIntegral n_insns - 1) asm_insns
+  let !insns_arr =  mkBCOByteArray $ final_isn_array
       !bitmap_arr = mkBCOByteArray $ mkBitmapArray bsize bitmap
-      ul_bco = UnlinkedBCO nm arity insns_arr bitmap_arr (fromSizedSeq final_lits) (fromSizedSeq final_ptrs)
+      ul_bco = UnlinkedBCO nm arity insns_arr bitmap_arr (fromSmallArray final_lit_array) (fromSmallArray final_ptr_array)
 
   -- 8 Aug 01: Finalisers aren't safe when attached to non-primitive
   -- objects, since they might get run too early.  Disable this until
@@ -237,10 +279,6 @@ mkBitmapArray bsize bitmap
   = Array.listArray (0, length bitmap) $
       fromIntegral bsize : map (fromInteger . fromStgWord) bitmap
 
--- instrs nonptrs ptrs
-type AsmState = (SizedSeq Word16,
-                 SizedSeq BCONPtr,
-                 SizedSeq BCOPtr)
 
 data Operand
   = Op Word
@@ -260,39 +298,9 @@ truncHalfWord platform w = case platformWordSize platform of
   PW8 | w <= 4294967295 -> Op (fromIntegral w)
   _ -> pprPanic "GHC.ByteCode.Asm.truncHalfWord" (ppr w)
 
-data Assembler a
-  = AllocPtr (IO BCOPtr) (Word -> Assembler a)
-  | AllocLit [BCONPtr] (Word -> Assembler a)
-  | AllocLabel LocalLabel (Assembler a)
-  | Emit Word16 [Operand] (Assembler a)
-  | NullAsm a
-  deriving (Functor)
-
-instance Applicative Assembler where
-    pure = NullAsm
-    (<*>) = ap
-
-instance Monad Assembler where
-  NullAsm x >>= f = f x
-  AllocPtr p k >>= f = AllocPtr p (k >=> f)
-  AllocLit l k >>= f = AllocLit l (k >=> f)
-  AllocLabel lbl k >>= f = AllocLabel lbl (k >>= f)
-  Emit w ops k >>= f = Emit w ops (k >>= f)
-
-ioptr :: IO BCOPtr -> Assembler Word
-ioptr p = AllocPtr p return
-
-ptr :: BCOPtr -> Assembler Word
-ptr = ioptr . return
-
-lit :: [BCONPtr] -> Assembler Word
-lit l = AllocLit l return
 
-label :: LocalLabel -> Assembler ()
-label w = AllocLabel w (return ())
-
-emit :: Word16 -> [Operand] -> Assembler ()
-emit w ops = Emit w ops (return ())
+ptr :: MonadAssembler m => BCOPtr -> m Word
+ptr = ioptr . return
 
 type LabelEnv = LocalLabel -> Word
 
@@ -303,38 +311,142 @@ largeOp long_jumps op = case op of
    IOp i     -> isLargeI i
    LabelOp _ -> long_jumps
 
-runAsm :: Platform -> Bool -> LabelEnv -> Assembler a -> StateT AsmState IO a
-runAsm platform long_jumps e = go
+newtype RunAsm a = RunAsm' { runRunAsm :: Bool
+                                       -> LabelEnv
+                                       -> RunAsmReader
+                                       -> AsmState
+                                       -> IO (AsmState, a) }
+
+pattern RunAsm :: (Bool -> LabelEnv -> RunAsmReader -> AsmState -> IO (AsmState, a))
+                  -> RunAsm a
+pattern RunAsm m <- RunAsm' m
   where
-    go (NullAsm x) = return x
-    go (AllocPtr p_io k) = do
-      p <- lift p_io
-      w <- state $ \(st_i0,st_l0,st_p0) ->
-        let st_p1 = addToSS st_p0 p
-        in (sizeSS st_p0, (st_i0,st_l0,st_p1))
-      go $ k w
-    go (AllocLit lits k) = do
-      w <- state $ \(st_i0,st_l0,st_p0) ->
-        let st_l1 = addListToSS st_l0 lits
-        in (sizeSS st_l0, (st_i0,st_l1,st_p0))
-      go $ k w
-    go (AllocLabel _ k) = go k
-    go (Emit w ops k) = do
-      let largeArgs = any (largeOp long_jumps) ops
-          opcode
-            | largeArgs = largeArgInstr w
-            | otherwise = w
-          words = concatMap expand ops
-          expand (SmallOp w) = [w]
-          expand (LabelOp w) = expand (Op (e w))
-          expand (Op w) = if largeArgs then largeArg platform (fromIntegral w) else [fromIntegral w]
-          expand (IOp i) = if largeArgs then largeArg platform (fromIntegral i) else [fromIntegral i]
-      state $ \(st_i0,st_l0,st_p0) ->
-        let st_i1 = addListToSS st_i0 (opcode : words)
-        in ((), (st_i1,st_l0,st_p0))
-      go k
-
-type LabelEnvMap = Map LocalLabel Word
+    RunAsm m = RunAsm' (oneShot $ \a -> oneShot $ \b -> oneShot $ \c -> oneShot $ \d -> m a b c d)
+{-# COMPLETE RunAsm #-}
+
+instance Functor RunAsm where
+  fmap f (RunAsm x) = RunAsm (\a b c !s -> fmap (fmap f) (x a b c s))
+
+instance Applicative RunAsm where
+  pure x = RunAsm $ \_ _ _ !s -> pure (s, x)
+  (RunAsm f) <*> (RunAsm x) = RunAsm $ \a b c !s -> do
+                                  (!s', f') <- f a b c s
+                                  (!s'', x') <- x a b c s'
+                                  return (s'', f' x')
+  {-# INLINE (<*>) #-}
+
+
+instance Monad RunAsm where
+  return  = pure
+  (RunAsm m) >>= f = RunAsm $ \a b c !s -> m a b c s >>= \(s', r) -> runRunAsm (f r) a b c s'
+  {-# INLINE (>>=) #-}
+
+runAsm :: Bool -> LabelEnv -> RunAsmReader -> AsmState -> RunAsm a -> IO a
+runAsm long_jumps e r s (RunAsm'{runRunAsm}) = fmap snd $ runRunAsm long_jumps e r s
+
+expand :: PlatformWordSize -> Bool -> Operand -> RunAsm ()
+expand word_size largeArgs o = do
+  e <- askEnv
+  case o of
+    (SmallOp w) -> writeIsn w
+    (LabelOp w) -> let !r = e w in handleLargeArg r
+    (Op w) -> handleLargeArg w
+    (IOp i) -> handleLargeArg i
+
+  where
+    handleLargeArg :: Integral a => a -> RunAsm ()
+    handleLargeArg w  =
+      if largeArgs
+        then largeArg word_size (fromIntegral w)
+        else writeIsn (fromIntegral w)
+
+lift :: IO a -> RunAsm a
+lift io = RunAsm $ \_ _ _ s -> io >>= \a -> pure (s, a)
+
+askLongJumps :: RunAsm Bool
+askLongJumps = RunAsm $ \a _ _ s -> pure (s, a)
+
+askEnv :: RunAsm LabelEnv
+askEnv = RunAsm $ \_ b _ s -> pure (s, b)
+
+writePtr :: BCOPtr -> RunAsm Word
+writePtr w
+            = RunAsm $ \_ _ (RunAsmReader{..}) asm -> do
+              writeSmallArrayIO ptr_array (nptr asm) w
+              let !n' = nptr asm + 1
+              let !asm' = asm { nptr = n' }
+              return (asm', fromIntegral (nptr asm))
+
+writeLit :: BCONPtr -> RunAsm Word
+writeLit w = RunAsm $ \_ _ (RunAsmReader{..}) asm -> do
+              writeSmallArrayIO lit_array (nlit asm) w
+              let !n' = nlit asm + 1
+              let !asm' = asm { nlit = n' }
+              return (asm', fromIntegral (nlit asm))
+
+writeLits :: OneOrTwo BCONPtr -> RunAsm Word
+writeLits (OnlyOne l) = writeLit l
+writeLits (OnlyTwo l1 l2) = writeLit l1 <* writeLit l2
+
+writeIsn :: Word16 -> RunAsm ()
+writeIsn w = RunAsm $ \_ _ (RunAsmReader{..}) asm -> do
+#if defined(DEBUG)
+              Array.writeArray isn_array (nisn asm) w
+#else
+              unsafeWrite isn_array (nisn asm) w
+#endif
+              let !n' = nisn asm + 1
+              let !asm' = asm { nisn = n' }
+              return (asm', ())
+
+{-# INLINE any #-}
+-- Any is unrolled manually so that the call in `emit` can be eliminated without
+-- relying on SpecConstr (which does not work across modules).
+any :: (a -> Bool) -> [a] -> Bool
+any _ [] = False
+any f [x] = f x
+any f [x,y] = f x || f y
+any f [x,y,z] = f x || f y || f z
+any f [x1,x2,x3,x4] = f x1 || f x2 || f x3 || f x4
+any f [x1,x2,x3,x4, x5] = f x1 || f x2 || f x3 || f x4 || f x5
+any f [x1,x2,x3,x4,x5,x6] = f x1 || f x2 || f x3 || f x4 || f x5 || f x6
+any f xs = List.any f xs
+
+{-# INLINE mapM6_ #-}
+mapM6_ :: Monad m => (a -> m b) -> [a] -> m ()
+mapM6_ _ [] = return ()
+mapM6_ f [x] = () <$ f x
+mapM6_ f [x,y] = () <$ f x <* f y
+mapM6_ f [x,y,z] = () <$ f x <* f y <* f z
+mapM6_ f [a1,a2,a3,a4] = () <$ f a1 <* f a2 <* f a3 <* f a4
+mapM6_ f [a1,a2,a3,a4,a5] = () <$ f a1 <* f a2 <* f a3 <* f a4 <* f a5
+mapM6_ f [a1,a2,a3,a4,a5,a6] = () <$ f a1 <* f a2 <* f a3 <* f a4 <* f a5 <* f a6
+mapM6_ f xs = mapM_ f xs
+
+instance MonadAssembler RunAsm where
+  ioptr p_io = do
+    p <- lift p_io
+    writePtr p
+  lit lits = writeLits lits
+
+  label _ = return ()
+
+  emit pwordsize w ops = do
+    long_jumps <- askLongJumps
+    -- See the definition of `any` above
+    let largeArgs = any (largeOp long_jumps) ops
+    let opcode
+          | largeArgs = largeArgInstr w
+          | otherwise = w
+    writeIsn opcode
+    mapM6_ (expand pwordsize largeArgs) ops
+
+  {-# INLINE emit #-}
+  {-# INLINE label #-}
+  {-# INLINE lit #-}
+  {-# INLINE ioptr #-}
+
+type LabelEnvMap = UniqFM LocalLabel Word
 
 data InspectState = InspectState
   { instrCount :: !Word
@@ -343,26 +455,105 @@ data InspectState = InspectState
   , lblEnv :: LabelEnvMap
   }
 
-inspectAsm :: Platform -> Bool -> Word -> Assembler a -> (Word, LabelEnvMap)
-inspectAsm platform long_jumps initial_offset
-  = go (InspectState initial_offset 0 0 Map.empty)
+instance Outputable InspectState where
+  ppr (InspectState i p l m) = text "InspectState" <+> ppr [ppr i, ppr p, ppr l, ppr (sizeUFM m)]
+
+isLargeInspectState :: InspectState -> Bool
+isLargeInspectState InspectState{..} =
+  isLargeW (fromIntegral $ sizeUFM lblEnv)
+    || isLargeW instrCount
+
+newtype InspectEnv = InspectEnv { _inspectLongJumps :: Bool
+                                }
+
+newtype InspectAsm a = InspectAsm' { runInspectAsm :: InspectEnv -> InspectState -> (# InspectState,  a #) }
+
+pattern InspectAsm :: (InspectEnv -> InspectState -> (# InspectState, a #))
+                   -> InspectAsm a
+pattern InspectAsm m <- InspectAsm' m
   where
-    go s (NullAsm _) = (instrCount s, lblEnv s)
-    go s (AllocPtr _ k) = go (s { ptrCount = n + 1 }) (k n)
-      where n = ptrCount s
-    go s (AllocLit ls k) = go (s { litCount = n + genericLength ls }) (k n)
-      where n = litCount s
-    go s (AllocLabel lbl k) = go s' k
-      where s' = s { lblEnv = Map.insert lbl (instrCount s) (lblEnv s) }
-    go s (Emit _ ops k) = go s' k
-      where
-        s' = s { instrCount = instrCount s + size }
-        size = sum (map count ops) + 1
+    InspectAsm m = InspectAsm' (oneShot $ \a -> oneShot $ \b -> m a b)
+{-# COMPLETE InspectAsm #-}
+
+instance Functor InspectAsm where
+  fmap f (InspectAsm k) = InspectAsm $ \a b -> case k a b of
+                                                  (# b', c #) -> (# b', f c #)
+
+instance Applicative InspectAsm where
+  pure x = InspectAsm $ \_ s -> (# s, x #)
+  (InspectAsm f) <*> (InspectAsm x) = InspectAsm $ \a b -> case f a b of
+                                                              (# s', f' #) ->
+                                                                case x a s' of
+                                                                  (# s'', x' #) -> (# s'', f' x' #)
+
+instance Monad InspectAsm where
+  return = pure
+  (InspectAsm m) >>= f = InspectAsm $ \ a b -> case m a b of
+                                                (# s', a' #) -> runInspectAsm (f a') a s'
+
+get_ :: InspectAsm InspectState
+get_ = InspectAsm $ \_ b -> (# b, b #)
+
+put_ :: InspectState -> InspectAsm ()
+put_ !s = InspectAsm $ \_ _ -> (# s, () #)
+
+modify_ :: (InspectState -> InspectState) -> InspectAsm ()
+modify_ f = InspectAsm $ \_ s -> let !s' = f s in (# s', () #)
+
+ask_ :: InspectAsm InspectEnv
+ask_ = InspectAsm $ \a b -> (# b, a #)
+
+inspectAsm :: Bool -> Word -> InspectAsm () -> InspectState
+inspectAsm long_jumps initial_offset (InspectAsm s) =
+  case s (InspectEnv long_jumps) (InspectState initial_offset 0 0 emptyUFM) of
+    (# res, () #) -> res
+{-# INLINE inspectAsm #-}
+
+
+
+instance MonadAssembler InspectAsm where
+  ioptr _ = do
+    s <- get_
+    let n = ptrCount s
+    put_ (s { ptrCount = n + 1 })
+    return n
+
+  lit ls = do
+    s <- get_
+    let n = litCount s
+    put_ (s { litCount = n + oneTwoLength ls })
+    return n
+
+  label lbl = modify_ (\s -> let !count = instrCount s in let !env' = addToUFM (lblEnv s) lbl count in s { lblEnv = env' })
+
+  emit pwordsize _ ops = do
+    InspectEnv long_jumps <- ask_
+    -- Size is written in this way as `mapM6_` is also used by RunAsm, and guaranteed
+    -- to unroll for arguments up to size 6.
+    let size = (MTL.execState (mapM6_ (\x -> MTL.modify (count' x +)) ops) 0) + 1
         largeOps = any (largeOp long_jumps) ops
-        count (SmallOp _) = 1
-        count (LabelOp _) = count (Op 0)
-        count (Op _) = if largeOps then largeArg16s platform else 1
-        count (IOp _) = if largeOps then largeArg16s platform else 1
+        bigSize = largeArg16s pwordsize
+        count' = if largeOps then countLarge bigSize else countSmall bigSize
+
+    s <- get_
+    put_ (s { instrCount = instrCount s + size })
+
+  {-# INLINE emit #-}
+  {-# INLINE label #-}
+  {-# INLINE lit #-}
+  {-# INLINE ioptr #-}
+
+count :: Word -> Bool -> Operand -> Word
+count _ _ (SmallOp _)          = 1
+count big largeOps (LabelOp _) = if largeOps then big else 1
+count big largeOps (Op _)      = if largeOps then big else 1
+count big largeOps (IOp _)     = if largeOps then big else 1
+{-# INLINE count #-}
+
+countSmall, countLarge :: Word -> Operand -> Word
+countLarge big x = count big True x
+countSmall big x = count big False x
+
 
 -- Bring in all the bci_ bytecode constants.
 #include "Bytecodes.h"
@@ -370,47 +561,67 @@ inspectAsm platform long_jumps initial_offset
 largeArgInstr :: Word16 -> Word16
 largeArgInstr bci = bci_FLAG_LARGE_ARGS .|. bci
 
-largeArg :: Platform -> Word64 -> [Word16]
-largeArg platform w = case platformWordSize platform of
-   PW8 -> [fromIntegral (w `shiftR` 48),
-           fromIntegral (w `shiftR` 32),
-           fromIntegral (w `shiftR` 16),
-           fromIntegral w]
+{-# INLINE largeArg #-}
+largeArg :: PlatformWordSize -> Word64 -> RunAsm ()
+largeArg wsize w = case wsize of
+   PW8 ->  do writeIsn (fromIntegral (w `shiftR` 48))
+              writeIsn (fromIntegral (w `shiftR` 32))
+              writeIsn (fromIntegral (w `shiftR` 16))
+              writeIsn (fromIntegral w)
    PW4 -> assertPpr (w < fromIntegral (maxBound :: Word32))
-                    (text "largeArg too big:" <+> ppr w) $
-          [fromIntegral (w `shiftR` 16),
-           fromIntegral w]
+                    (text "largeArg too big:" <+> ppr w) $ do
+          writeIsn (fromIntegral (w `shiftR` 16))
+          writeIsn (fromIntegral w)
 
-largeArg16s :: Platform -> Word
-largeArg16s platform = case platformWordSize platform of
+largeArg16s :: PlatformWordSize -> Word
+largeArg16s pwordsize = case pwordsize of
    PW8 -> 4
    PW4 -> 2
 
-assembleI :: Platform
+data OneOrTwo a = OnlyOne a | OnlyTwo a a deriving (Functor)
+
+oneTwoLength :: OneOrTwo a -> Word
+oneTwoLength (OnlyOne {}) = 1
+oneTwoLength (OnlyTwo {}) = 2
+
+class Monad m => MonadAssembler m where
+  ioptr :: IO BCOPtr -> m Word
+  lit :: OneOrTwo BCONPtr -> m Word
+  label :: LocalLabel -> m ()
+  emit :: PlatformWordSize -> Word16 -> [Operand] -> m ()
+
+lit1 :: MonadAssembler m => BCONPtr -> m Word
+lit1 p = lit (OnlyOne p)
+
+{-# SPECIALISE assembleI :: Platform -> BCInstr -> InspectAsm () #-}
+{-# SPECIALISE assembleI :: Platform -> BCInstr -> RunAsm () #-}
+
+assembleI :: forall m . MonadAssembler m
+          => Platform
           -> BCInstr
-          -> Assembler ()
+          -> m ()
 assembleI platform i = case i of
-  STKCHECK n               -> emit bci_STKCHECK [Op n]
-  PUSH_L o1                -> emit bci_PUSH_L [wOp o1]
-  PUSH_LL o1 o2            -> emit bci_PUSH_LL [wOp o1, wOp o2]
-  PUSH_LLL o1 o2 o3        -> emit bci_PUSH_LLL [wOp o1, wOp o2, wOp o3]
-  PUSH8 o1                 -> emit bci_PUSH8 [bOp o1]
-  PUSH16 o1                -> emit bci_PUSH16 [bOp o1]
-  PUSH32 o1                -> emit bci_PUSH32 [bOp o1]
-  PUSH8_W o1               -> emit bci_PUSH8_W [bOp o1]
-  PUSH16_W o1              -> emit bci_PUSH16_W [bOp o1]
-  PUSH32_W o1              -> emit bci_PUSH32_W [bOp o1]
+  STKCHECK n               -> emit_ bci_STKCHECK [Op n]
+  PUSH_L o1                -> emit_ bci_PUSH_L [wOp o1]
+  PUSH_LL o1 o2            -> emit_ bci_PUSH_LL [wOp o1, wOp o2]
+  PUSH_LLL o1 o2 o3        -> emit_ bci_PUSH_LLL [wOp o1, wOp o2, wOp o3]
+  PUSH8 o1                 -> emit_ bci_PUSH8 [bOp o1]
+  PUSH16 o1                -> emit_ bci_PUSH16 [bOp o1]
+  PUSH32 o1                -> emit_ bci_PUSH32 [bOp o1]
+  PUSH8_W o1               -> emit_ bci_PUSH8_W [bOp o1]
+  PUSH16_W o1              -> emit_ bci_PUSH16_W [bOp o1]
+  PUSH32_W o1              -> emit_ bci_PUSH32_W [bOp o1]
   PUSH_G nm                -> do p <- ptr (BCOPtrName nm)
-                                 emit bci_PUSH_G [Op p]
+                                 emit_ bci_PUSH_G [Op p]
   PUSH_PRIMOP op           -> do p <- ptr (BCOPtrPrimOp op)
-                                 emit bci_PUSH_G [Op p]
+                                 emit_ bci_PUSH_G [Op p]
   PUSH_BCO proto           -> do let ul_bco = assembleBCO platform proto
                                  p <- ioptr (liftM BCOPtrBCO ul_bco)
-                                 emit bci_PUSH_G [Op p]
+                                 emit_ bci_PUSH_G [Op p]
   PUSH_ALTS proto pk
                            -> do let ul_bco = assembleBCO platform proto
                                  p <- ioptr (liftM BCOPtrBCO ul_bco)
-                                 emit (push_alts pk) [Op p]
+                                 emit_ (push_alts pk) [Op p]
   PUSH_ALTS_TUPLE proto call_info tuple_proto
                            -> do let ul_bco = assembleBCO platform proto
                                      ul_tuple_bco = assembleBCO platform
@@ -419,127 +630,131 @@ assembleI platform i = case i of
                                  p_tup <- ioptr (liftM BCOPtrBCO ul_tuple_bco)
                                  info <- word (fromIntegral $
                                               mkNativeCallInfoSig platform call_info)
-                                 emit bci_PUSH_ALTS_T
+                                 emit_ bci_PUSH_ALTS_T
                                       [Op p, Op info, Op p_tup]
-  PUSH_PAD8                -> emit bci_PUSH_PAD8 []
-  PUSH_PAD16               -> emit bci_PUSH_PAD16 []
-  PUSH_PAD32               -> emit bci_PUSH_PAD32 []
+  PUSH_PAD8                -> emit_ bci_PUSH_PAD8 []
+  PUSH_PAD16               -> emit_ bci_PUSH_PAD16 []
+  PUSH_PAD32               -> emit_ bci_PUSH_PAD32 []
   PUSH_UBX8 lit            -> do np <- literal lit
-                                 emit bci_PUSH_UBX8 [Op np]
+                                 emit_ bci_PUSH_UBX8 [Op np]
   PUSH_UBX16 lit           -> do np <- literal lit
-                                 emit bci_PUSH_UBX16 [Op np]
+                                 emit_ bci_PUSH_UBX16 [Op np]
   PUSH_UBX32 lit           -> do np <- literal lit
-                                 emit bci_PUSH_UBX32 [Op np]
+                                 emit_ bci_PUSH_UBX32 [Op np]
   PUSH_UBX lit nws         -> do np <- literal lit
-                                 emit bci_PUSH_UBX [Op np, wOp nws]
-
+                                 emit_ bci_PUSH_UBX [Op np, wOp nws]
   -- see Note [Generating code for top-level string literal bindings] in GHC.StgToByteCode
-  PUSH_ADDR nm             -> do np <- lit [BCONPtrAddr nm]
-                                 emit bci_PUSH_UBX [Op np, SmallOp 1]
-
-  PUSH_APPLY_N             -> emit bci_PUSH_APPLY_N []
-  PUSH_APPLY_V             -> emit bci_PUSH_APPLY_V []
-  PUSH_APPLY_F             -> emit bci_PUSH_APPLY_F []
-  PUSH_APPLY_D             -> emit bci_PUSH_APPLY_D []
-  PUSH_APPLY_L             -> emit bci_PUSH_APPLY_L []
-  PUSH_APPLY_P             -> emit bci_PUSH_APPLY_P []
-  PUSH_APPLY_PP            -> emit bci_PUSH_APPLY_PP []
-  PUSH_APPLY_PPP           -> emit bci_PUSH_APPLY_PPP []
-  PUSH_APPLY_PPPP          -> emit bci_PUSH_APPLY_PPPP []
-  PUSH_APPLY_PPPPP         -> emit bci_PUSH_APPLY_PPPPP []
-  PUSH_APPLY_PPPPPP        -> emit bci_PUSH_APPLY_PPPPPP []
-
-  SLIDE     n by           -> emit bci_SLIDE [wOp n, wOp by]
-  ALLOC_AP  n              -> emit bci_ALLOC_AP [truncHalfWord platform n]
-  ALLOC_AP_NOUPD n         -> emit bci_ALLOC_AP_NOUPD [truncHalfWord platform n]
-  ALLOC_PAP arity n        -> emit bci_ALLOC_PAP [truncHalfWord platform arity, truncHalfWord platform n]
-  MKAP      off sz         -> emit bci_MKAP [wOp off, truncHalfWord platform sz]
-  MKPAP     off sz         -> emit bci_MKPAP [wOp off, truncHalfWord platform sz]
-  UNPACK    n              -> emit bci_UNPACK [wOp n]
-  PACK      dcon sz        -> do itbl_no <- lit [BCONPtrItbl (getName dcon)]
-                                 emit bci_PACK [Op itbl_no, wOp sz]
+  PUSH_ADDR nm             -> do np <- lit1 (BCONPtrAddr nm)
+                                 emit_ bci_PUSH_UBX [Op np, SmallOp 1]
+
+  PUSH_APPLY_N             -> emit_ bci_PUSH_APPLY_N []
+  PUSH_APPLY_V             -> emit_ bci_PUSH_APPLY_V []
+  PUSH_APPLY_F             -> emit_ bci_PUSH_APPLY_F []
+  PUSH_APPLY_D             -> emit_ bci_PUSH_APPLY_D []
+  PUSH_APPLY_L             -> emit_ bci_PUSH_APPLY_L []
+  PUSH_APPLY_P             -> emit_ bci_PUSH_APPLY_P []
+  PUSH_APPLY_PP            -> emit_ bci_PUSH_APPLY_PP []
+  PUSH_APPLY_PPP           -> emit_ bci_PUSH_APPLY_PPP []
+  PUSH_APPLY_PPPP          -> emit_ bci_PUSH_APPLY_PPPP []
+  PUSH_APPLY_PPPPP         -> emit_ bci_PUSH_APPLY_PPPPP []
+  PUSH_APPLY_PPPPPP        -> emit_ bci_PUSH_APPLY_PPPPPP []
+
+  SLIDE     n by           -> emit_ bci_SLIDE [wOp n, wOp by]
+  ALLOC_AP  n              -> emit_ bci_ALLOC_AP [truncHalfWord platform n]
+  ALLOC_AP_NOUPD n         -> emit_ bci_ALLOC_AP_NOUPD [truncHalfWord platform n]
+  ALLOC_PAP arity n        -> emit_ bci_ALLOC_PAP [truncHalfWord platform arity, truncHalfWord platform n]
+  MKAP      off sz         -> emit_ bci_MKAP [wOp off, truncHalfWord platform sz]
+  MKPAP     off sz         -> emit_ bci_MKPAP [wOp off, truncHalfWord platform sz]
+  UNPACK    n              -> emit_ bci_UNPACK [wOp n]
+  PACK      dcon sz        -> do itbl_no <- lit1 (BCONPtrItbl (getName dcon))
+                                 emit_ bci_PACK [Op itbl_no, wOp sz]
   LABEL     lbl            -> label lbl
   TESTLT_I  i l            -> do np <- int i
-                                 emit bci_TESTLT_I [Op np, LabelOp l]
+                                 emit_ bci_TESTLT_I [Op np, LabelOp l]
   TESTEQ_I  i l            -> do np <- int i
-                                 emit bci_TESTEQ_I [Op np, LabelOp l]
+                                 emit_ bci_TESTEQ_I [Op np, LabelOp l]
   TESTLT_W  w l            -> do np <- word w
-                                 emit bci_TESTLT_W [Op np, LabelOp l]
+                                 emit_ bci_TESTLT_W [Op np, LabelOp l]
   TESTEQ_W  w l            -> do np <- word w
-                                 emit bci_TESTEQ_W [Op np, LabelOp l]
+                                 emit_ bci_TESTEQ_W [Op np, LabelOp l]
   TESTLT_I64  i l          -> do np <- word64 (fromIntegral i)
-                                 emit bci_TESTLT_I64 [Op np, LabelOp l]
+                                 emit_ bci_TESTLT_I64 [Op np, LabelOp l]
   TESTEQ_I64  i l          -> do np <- word64 (fromIntegral i)
-                                 emit bci_TESTEQ_I64 [Op np, LabelOp l]
+                                 emit_ bci_TESTEQ_I64 [Op np, LabelOp l]
   TESTLT_I32  i l          -> do np <- word (fromIntegral i)
-                                 emit bci_TESTLT_I32 [Op np, LabelOp l]
+                                 emit_ bci_TESTLT_I32 [Op np, LabelOp l]
   TESTEQ_I32 i l           -> do np <- word (fromIntegral i)
-                                 emit bci_TESTEQ_I32 [Op np, LabelOp l]
+                                 emit_ bci_TESTEQ_I32 [Op np, LabelOp l]
   TESTLT_I16  i l          -> do np <- word (fromIntegral i)
-                                 emit bci_TESTLT_I16 [Op np, LabelOp l]
+                                 emit_ bci_TESTLT_I16 [Op np, LabelOp l]
   TESTEQ_I16 i l           -> do np <- word (fromIntegral i)
-                                 emit bci_TESTEQ_I16 [Op np, LabelOp l]
+                                 emit_ bci_TESTEQ_I16 [Op np, LabelOp l]
   TESTLT_I8  i l           -> do np <- word (fromIntegral i)
-                                 emit bci_TESTLT_I8 [Op np, LabelOp l]
+                                 emit_ bci_TESTLT_I8 [Op np, LabelOp l]
   TESTEQ_I8 i l            -> do np <- word (fromIntegral i)
-                                 emit bci_TESTEQ_I8 [Op np, LabelOp l]
+                                 emit_ bci_TESTEQ_I8 [Op np, LabelOp l]
   TESTLT_W64  w l          -> do np <- word64 w
-                                 emit bci_TESTLT_W64 [Op np, LabelOp l]
+                                 emit_ bci_TESTLT_W64 [Op np, LabelOp l]
   TESTEQ_W64  w l          -> do np <- word64 w
-                                 emit bci_TESTEQ_W64 [Op np, LabelOp l]
+                                 emit_ bci_TESTEQ_W64 [Op np, LabelOp l]
   TESTLT_W32  w l          -> do np <- word (fromIntegral w)
-                                 emit bci_TESTLT_W32 [Op np, LabelOp l]
+                                 emit_ bci_TESTLT_W32 [Op np, LabelOp l]
   TESTEQ_W32  w l          -> do np <- word (fromIntegral w)
-                                 emit bci_TESTEQ_W32 [Op np, LabelOp l]
+                                 emit_ bci_TESTEQ_W32 [Op np, LabelOp l]
   TESTLT_W16  w l          -> do np <- word (fromIntegral w)
-                                 emit bci_TESTLT_W16 [Op np, LabelOp l]
+                                 emit_ bci_TESTLT_W16 [Op np, LabelOp l]
   TESTEQ_W16  w l          -> do np <- word (fromIntegral w)
-                                 emit bci_TESTEQ_W16 [Op np, LabelOp l]
+                                 emit_ bci_TESTEQ_W16 [Op np, LabelOp l]
   TESTLT_W8  w l           -> do np <- word (fromIntegral w)
-                                 emit bci_TESTLT_W8 [Op np, LabelOp l]
+                                 emit_ bci_TESTLT_W8 [Op np, LabelOp l]
   TESTEQ_W8  w l           -> do np <- word (fromIntegral w)
-                                 emit bci_TESTEQ_W8 [Op np, LabelOp l]
+                                 emit_ bci_TESTEQ_W8 [Op np, LabelOp l]
   TESTLT_F  f l            -> do np <- float f
-                                 emit bci_TESTLT_F [Op np, LabelOp l]
+                                 emit_ bci_TESTLT_F [Op np, LabelOp l]
   TESTEQ_F  f l            -> do np <- float f
-                                 emit bci_TESTEQ_F [Op np, LabelOp l]
+                                 emit_ bci_TESTEQ_F [Op np, LabelOp l]
   TESTLT_D  d l            -> do np <- double d
-                                 emit bci_TESTLT_D [Op np, LabelOp l]
+                                 emit_ bci_TESTLT_D [Op np, LabelOp l]
   TESTEQ_D  d l            -> do np <- double d
-                                 emit bci_TESTEQ_D [Op np, LabelOp l]
-  TESTLT_P  i l            -> emit bci_TESTLT_P [SmallOp i, LabelOp l]
-  TESTEQ_P  i l            -> emit bci_TESTEQ_P [SmallOp i, LabelOp l]
-  CASEFAIL                 -> emit bci_CASEFAIL []
-  SWIZZLE   stkoff n       -> emit bci_SWIZZLE [wOp stkoff, IOp n]
-  JMP       l              -> emit bci_JMP [LabelOp l]
-  ENTER                    -> emit bci_ENTER []
-  RETURN rep               -> emit (return_non_tuple rep) []
-  RETURN_TUPLE             -> emit bci_RETURN_T []
+                                 emit_ bci_TESTEQ_D [Op np, LabelOp l]
+  TESTLT_P  i l            -> emit_ bci_TESTLT_P [SmallOp i, LabelOp l]
+  TESTEQ_P  i l            -> emit_ bci_TESTEQ_P [SmallOp i, LabelOp l]
+  CASEFAIL                 -> emit_ bci_CASEFAIL []
+  SWIZZLE   stkoff n       -> emit_ bci_SWIZZLE [wOp stkoff, IOp n]
+  JMP       l              -> emit_ bci_JMP [LabelOp l]
+  ENTER                    -> emit_ bci_ENTER []
+  RETURN rep               -> emit_ (return_non_tuple rep) []
+  RETURN_TUPLE             -> emit_ bci_RETURN_T []
   CCALL off m_addr i       -> do np <- addr m_addr
-                                 emit bci_CCALL [wOp off, Op np, SmallOp i]
-  PRIMCALL                 -> emit bci_PRIMCALL []
+                                 emit_ bci_CCALL [wOp off, Op np, SmallOp i]
+  PRIMCALL                 -> emit_ bci_PRIMCALL []
   BRK_FUN arr tick_mod tickx info_mod infox cc ->
                               do p1 <- ptr (BCOPtrBreakArray arr)
                                  tick_addr <- addr tick_mod
                                  info_addr <- addr info_mod
                                  np <- addr cc
-                                 emit bci_BRK_FUN [ Op p1
+                                 emit_ bci_BRK_FUN [ Op p1
                                                   , Op tick_addr, Op info_addr
                                                   , SmallOp tickx, SmallOp infox
                                                   , Op np
                                                   ]
 #if MIN_VERSION_rts(1,0,3)
-  BCO_NAME name            -> do np <- lit [BCONPtrStr name]
-                                 emit bci_BCO_NAME [Op np]
+  BCO_NAME name            -> do np <- lit1 (BCONPtrStr name)
+                                 emit_ bci_BCO_NAME [Op np]
 #endif
 
+
+
   where
+    emit_ = emit word_size
+
+    literal :: Literal -> m Word
     literal (LitLabel fs _)   = litlabel fs
     literal LitNullAddr       = word 0
     literal (LitFloat r)      = float (fromRational r)
     literal (LitDouble r)     = double (fromRational r)
     literal (LitChar c)       = int (ord c)
-    literal (LitString bs)    = lit [BCONPtrStr bs]
+    literal (LitString bs)    = lit1 (BCONPtrStr bs)
        -- LitString requires a zero-terminator when emitted
     literal (LitNumber nt i) = case nt of
       LitNumInt     -> word (fromIntegral i)
@@ -559,10 +774,11 @@ assembleI platform i = case i of
     -- analysis messed up.
     literal (LitRubbish {}) = word 0
 
-    litlabel fs = lit [BCONPtrLbl fs]
-    addr (RemotePtr a) = words [fromIntegral a]
-    words ws = lit (map BCONPtrWord ws)
-    word w = words [w]
+    litlabel fs = lit1 (BCONPtrLbl fs)
+    addr (RemotePtr a) = word (fromIntegral a)
+    words ws = lit (fmap BCONPtrWord ws)
+    word w = words (OnlyOne w)
+    word2 w1 w2 = words (OnlyTwo w1 w2)
     word_size  = platformWordSize platform
     word_size_bits = platformWordSizeInBits platform
 
@@ -573,36 +789,36 @@ assembleI platform i = case i of
     -- Note that we only support host endianness == target endianness for now,
     -- even with the external interpreter. This would need to be fixed to
     -- support host endianness /= target endianness
-    int :: Int -> Assembler Word
+    int :: Int -> m Word
     int  i = word (fromIntegral i)
 
-    float :: Float -> Assembler Word
+    float :: Float -> m Word
     float f = word32 (castFloatToWord32 f)
 
-    double :: Double -> Assembler Word
+    double :: Double -> m Word
     double d = word64 (castDoubleToWord64 d)
 
-    word64 :: Word64 -> Assembler Word
+    word64 :: Word64 -> m Word
     word64 ww = case word_size of
        PW4 ->
         let !wl = fromIntegral ww
             !wh = fromIntegral (ww `unsafeShiftR` 32)
         in case platformByteOrder platform of
-            LittleEndian -> words [wl,wh]
-            BigEndian    -> words [wh,wl]
+            LittleEndian -> word2 wl wh
+            BigEndian    -> word2 wh wl
        PW8 -> word (fromIntegral ww)
 
-    word8 :: Word8 -> Assembler Word
+    word8 :: Word8 -> m Word
     word8  x = case platformByteOrder platform of
       LittleEndian -> word (fromIntegral x)
       BigEndian    -> word (fromIntegral x `unsafeShiftL` (word_size_bits - 8))
 
-    word16 :: Word16 -> Assembler Word
+    word16 :: Word16 -> m Word
     word16 x = case platformByteOrder platform of
       LittleEndian -> word (fromIntegral x)
       BigEndian    -> word (fromIntegral x `unsafeShiftL` (word_size_bits - 16))
 
-    word32 :: Word32 -> Assembler Word
+    word32 :: Word32 -> m Word
     word32 x = case platformByteOrder platform of
       LittleEndian -> word (fromIntegral x)
       BigEndian    -> case word_size of


=====================================
compiler/GHC/ByteCode/Instr.hs
=====================================
@@ -36,6 +36,7 @@ import GHC.Stack.CCS (CostCentre)
 import GHC.Stg.Syntax
 import GHCi.BreakArray (BreakArray)
 import Language.Haskell.Syntax.Module.Name (ModuleName)
+import GHC.Types.Unique
 
 -- ----------------------------------------------------------------------------
 -- Bytecode instructions
@@ -58,6 +59,10 @@ data ProtoBCO a
 newtype LocalLabel = LocalLabel { getLocalLabel :: Word32 }
   deriving (Eq, Ord)
 
+-- Just so we can easily juse UniqFM.
+instance Uniquable LocalLabel where
+  getUnique (LocalLabel w) = mkUniqueGrimily $ fromIntegral w
+
 instance Outputable LocalLabel where
   ppr (LocalLabel lbl) = text "lbl:" <> ppr lbl
 


=====================================
compiler/GHC/ByteCode/Types.hs
=====================================
@@ -21,7 +21,7 @@ module GHC.ByteCode.Types
   , CgBreakInfo(..)
   , ModBreaks (..), BreakIndex, emptyModBreaks
   , CCostCentre
-  , FlatBag, sizeFlatBag, fromSizedSeq, elemsFlatBag
+  , FlatBag, sizeFlatBag, fromSmallArray, elemsFlatBag
   ) where
 
 import GHC.Prelude


=====================================
compiler/GHC/CoreToStg/Prep.hs
=====================================
@@ -59,7 +59,8 @@ import GHC.Types.Id
 import GHC.Types.Id.Info
 import GHC.Types.Id.Make ( realWorldPrimId )
 import GHC.Types.Basic
-import GHC.Types.Name   ( NamedThing(..), nameSrcSpan, isInternalName )
+import GHC.Types.Name   ( NamedThing(..), nameSrcSpan, isInternalName, OccName )
+import GHC.Types.Name.Occurrence (occNameString)
 import GHC.Types.SrcLoc ( SrcSpan(..), realSrcLocSpan, mkRealSrcLoc )
 import GHC.Types.Literal
 import GHC.Types.Tickish
@@ -70,6 +71,7 @@ import qualified Data.ByteString.Builder as BB
 import Data.ByteString.Builder.Prim
 
 import Control.Monad
+import Data.List (intercalate)
 
 {-
 Note [CorePrep Overview]
@@ -247,11 +249,11 @@ corePrepPgm logger cp_cfg pgm_cfg
     withTiming logger
                (text "CorePrep"<+>brackets (ppr this_mod))
                (\a -> a `seqList` ()) $ do
-    us <- mkSplitUniqSupply 's'
     let initialCorePrepEnv = mkInitialCorePrepEnv cp_cfg
 
-    let
-        implicit_binds = mkDataConWorkers
+    us <- mkSplitUniqSupply 's'
+
+    let implicit_binds = mkDataConWorkers
           (cpPgm_generateDebugInfo pgm_cfg)
           mod_loc data_tycons
             -- NB: we must feed mkImplicitBinds through corePrep too
@@ -711,13 +713,13 @@ cpePair :: TopLevelFlag -> RecFlag -> Demand -> Levity
         -> UniqSM (Floats, CpeRhs)
 -- Used for all bindings
 -- The binder is already cloned, hence an OutId
-cpePair top_lvl is_rec dmd lev env bndr rhs
+cpePair top_lvl is_rec dmd lev env0 bndr rhs
   = assert (not (isJoinId bndr)) $ -- those should use cpeJoinPair
     do { (floats1, rhs1) <- cpeRhsE env rhs
 
        -- See if we are allowed to float this stuff out of the RHS
        ; let dec = want_float_from_rhs floats1 rhs1
-       ; (floats2, rhs2) <- executeFloatDecision dec floats1 rhs1
+       ; (floats2, rhs2) <- executeFloatDecision env dec floats1 rhs1
 
        -- Make the arity match up
        ; (floats3, rhs3)
@@ -725,7 +727,7 @@ cpePair top_lvl is_rec dmd lev env bndr rhs
                then return (floats2, cpeEtaExpand arity rhs2)
                else warnPprTrace True "CorePrep: silly extra arguments:" (ppr bndr) $
                                -- Note [Silly extra arguments]
-                    (do { v <- newVar (idType bndr)
+                    (do { v <- newVar env (idType bndr)
                         ; let (float, v') = mkNonRecFloat env Lifted v rhs2
                         ; return ( snocFloat floats2 float
                                  , cpeEtaExpand arity (Var v')) })
@@ -735,6 +737,8 @@ cpePair top_lvl is_rec dmd lev env bndr rhs
 
        ; return (floats4, rhs4) }
   where
+    env = pushBinderContext bndr env0
+
     arity = idArity bndr        -- We must match this arity
 
     want_float_from_rhs floats rhs
@@ -967,36 +971,36 @@ cpeBodyNF env expr
 cpeBody :: CorePrepEnv -> CoreExpr -> UniqSM (Floats, CpeBody)
 cpeBody env expr
   = do { (floats1, rhs) <- cpeRhsE env expr
-       ; (floats2, body) <- rhsToBody rhs
+       ; (floats2, body) <- rhsToBody env rhs
        ; return (floats1 `appFloats` floats2, body) }
 
 --------
-rhsToBody :: CpeRhs -> UniqSM (Floats, CpeBody)
+rhsToBody :: CorePrepEnv -> CpeRhs -> UniqSM (Floats, CpeBody)
 -- Remove top level lambdas by let-binding
 
-rhsToBody (Tick t expr)
+rhsToBody env (Tick t expr)
   | tickishScoped t == NoScope  -- only float out of non-scoped annotations
-  = do { (floats, expr') <- rhsToBody expr
+  = do { (floats, expr') <- rhsToBody env expr
        ; return (floats, mkTick t expr') }
 
-rhsToBody (Cast e co)
+rhsToBody env (Cast e co)
         -- You can get things like
         --      case e of { p -> coerce t (\s -> ...) }
-  = do { (floats, e') <- rhsToBody e
+  = do { (floats, e') <- rhsToBody env e
        ; return (floats, Cast e' co) }
 
-rhsToBody expr@(Lam {})   -- See Note [No eta reduction needed in rhsToBody]
+rhsToBody env expr@(Lam {})   -- See Note [No eta reduction needed in rhsToBody]
   | all isTyVar bndrs           -- Type lambdas are ok
   = return (emptyFloats, expr)
   | otherwise                   -- Some value lambdas
   = do { let rhs = cpeEtaExpand (exprArity expr) expr
-       ; fn <- newVar (exprType rhs)
+       ; fn <- newVar env (exprType rhs)
        ; let float = Float (NonRec fn rhs) LetBound TopLvlFloatable
        ; return (unitFloat float, Var fn) }
   where
     (bndrs,_) = collectBinders expr
 
-rhsToBody expr = return (emptyFloats, expr)
+rhsToBody _env expr = return (emptyFloats, expr)
 
 
 {- Note [No eta reduction needed in rhsToBody]
@@ -1168,7 +1172,7 @@ cpeApp top_env expr
         -- allocating CaseBound Floats for token and thing as needed
         = do { (floats1, token) <- cpeArg env topDmd token
              ; (floats2, thing) <- cpeBody env thing
-             ; case_bndr <- (`setIdUnfolding` evaldUnfolding) <$> newVar ty
+             ; case_bndr <- (`setIdUnfolding` evaldUnfolding) <$> newVar env ty
              ; let tup = mkCoreUnboxedTuple [token, Var case_bndr]
              ; let float = mkCaseFloat case_bndr thing
              ; return (floats1 `appFloats` floats2 `snocFloat` float, tup) }
@@ -1577,7 +1581,7 @@ cpeArg env dmd arg
        ; let arg_ty = exprType arg1
              lev    = typeLevity arg_ty
              dec    = wantFloatLocal NonRecursive dmd lev floats1 arg1
-       ; (floats2, arg2) <- executeFloatDecision dec floats1 arg1
+       ; (floats2, arg2) <- executeFloatDecision env dec floats1 arg1
                 -- Else case: arg1 might have lambdas, and we can't
                 --            put them inside a wrapBinds
 
@@ -1586,7 +1590,7 @@ cpeArg env dmd arg
        -- see Note [ANF-ising literal string arguments]
        ; if exprIsTrivial arg2
          then return (floats2, arg2)
-         else do { v <- (`setIdDemandInfo` dmd) <$> newVar arg_ty
+         else do { v <- (`setIdDemandInfo` dmd) <$> newVar env arg_ty
                        -- See Note [Pin demand info on floats]
                  ; let arity = cpeArgArity env dec floats1 arg2
                        arg3  = cpeEtaExpand arity arg2
@@ -2424,13 +2428,13 @@ instance Outputable FloatDecision where
   ppr FloatNone = text "none"
   ppr FloatAll  = text "all"
 
-executeFloatDecision :: FloatDecision -> Floats -> CpeRhs -> UniqSM (Floats, CpeRhs)
-executeFloatDecision dec floats rhs
+executeFloatDecision :: CorePrepEnv -> FloatDecision -> Floats -> CpeRhs -> UniqSM (Floats, CpeRhs)
+executeFloatDecision env dec floats rhs
   = case dec of
       FloatAll                 -> return (floats, rhs)
       FloatNone
         | isEmptyFloats floats -> return (emptyFloats, rhs)
-        | otherwise            -> do { (floats', body) <- rhsToBody rhs
+        | otherwise            -> do { (floats', body) <- rhsToBody env rhs
                                      ; return (emptyFloats, wrapBinds floats $
                                                             wrapBinds floats' body) }
             -- FloatNone case: `rhs` might have lambdas, and we can't
@@ -2569,6 +2573,8 @@ data CorePrepEnv
         , cpe_subst :: Subst  -- ^ See Note [CorePrepEnv: cpe_subst]
 
         , cpe_rec_ids :: UnVarSet -- Faster OutIdSet; See Note [Speculative evaluation]
+
+        , cpe_context :: [OccName] -- ^ See Note [Binder context]
     }
 
 mkInitialCorePrepEnv :: CorePrepConfig -> CorePrepEnv
@@ -2576,6 +2582,7 @@ mkInitialCorePrepEnv cfg = CPE
       { cpe_config        = cfg
       , cpe_subst         = emptySubst
       , cpe_rec_ids       = emptyUnVarSet
+      , cpe_context       = []
       }
 
 extendCorePrepEnv :: CorePrepEnv -> Id -> Id -> CorePrepEnv
@@ -2616,6 +2623,14 @@ cpSubstCo :: CorePrepEnv -> Coercion -> Coercion
 cpSubstCo (CPE { cpe_subst = subst }) co = substCo subst co
           -- substCo has a short-cut if the TCvSubst is empty
 
+-- | See Note [Binder context]
+pushBinderContext :: Id -> CorePrepEnv -> CorePrepEnv
+pushBinderContext ident env
+  | lengthAtLeast (cpe_context env) 2
+  = env
+  | otherwise
+  = env { cpe_context = getOccName ident : cpe_context env}
+
 ------------------------------------------------------------------------------
 -- Cloning binders
 -- ---------------------------------------------------------------------------
@@ -2704,10 +2719,20 @@ fiddleCCall id
 -- Generating new binders
 -- ---------------------------------------------------------------------------
 
-newVar :: Type -> UniqSM Id
-newVar ty
- = seqType ty `seq` mkSysLocalOrCoVarM (fsLit "sat") ManyTy ty
-
+newVar :: CorePrepEnv ->  Type -> UniqSM Id
+newVar env ty
+ -- See Note [Binder context]
+ = seqType ty `seq` mkSysLocalOrCoVarM (fsLit occ) ManyTy ty
+   where occ = intercalate "_" (map occNameString $ cpe_context env) ++ "_sat"
+
+{- Note [Binder context]
+   ~~~~~~~~~~~~~~~~~~~~~
+   To ensure that the compiled program (specifically symbol names)
+   remains understandable to the user we maintain a context
+   of binders that we are currently under. This allows us to give
+   identifiers conjured during CorePrep more contextually-meaningful
+   names. This is done in `newVar`.
+ -}
 
 ------------------------------------------------------------------------------
 -- Floating ticks


=====================================
compiler/GHC/Data/FastString.hs
=====================================
@@ -2,10 +2,19 @@
 {-# LANGUAGE MagicHash #-}
 {-# LANGUAGE UnboxedTuples #-}
 {-# LANGUAGE UnliftedFFITypes #-}
+{-# LANGUAGE CPP #-}
 
 {-# OPTIONS_GHC -O2 -funbox-strict-fields #-}
+#if MIN_VERSION_GLASGOW_HASKELL(9,8,0,0)
+{-# OPTIONS_GHC -fno-unoptimized-core-for-interpreter #-}
+#endif
 -- We always optimise this, otherwise performance of a non-optimised
 -- compiler is severely affected
+--
+-- Also important, if you load this module into GHCi then the data representation of
+-- FastString has to match that of the host compiler due to the shared FastString
+-- table. Otherwise you will get segfaults when the table is consulted and the fields
+-- from the FastString are in an incorrect order.
 
 -- |
 -- There are two principal string types used internally by GHC:


=====================================
compiler/GHC/Data/FlatBag.hs
=====================================
@@ -8,13 +8,11 @@ module GHC.Data.FlatBag
   , mappendFlatBag
   -- * Construction
   , fromList
-  , fromSizedSeq
+  , fromSmallArray
   ) where
 
 import GHC.Prelude
 
-import GHC.Data.SizedSeq (SizedSeq, ssElts, sizeSS)
-
 import Control.DeepSeq
 
 import GHC.Data.SmallArray
@@ -125,5 +123,10 @@ fromList n elts =
 -- | Convert a 'SizedSeq' into its flattened representation.
 -- A 'FlatBag a' is more memory efficient than '[a]', if no further modification
 -- is necessary.
-fromSizedSeq :: SizedSeq a -> FlatBag a
-fromSizedSeq s = fromList (sizeSS s) (ssElts s)
+fromSmallArray :: SmallArray a -> FlatBag a
+fromSmallArray s = case sizeofSmallArray s of
+                      0 -> EmptyFlatBag
+                      1 -> UnitFlatBag (indexSmallArray s 0)
+                      2 -> TupleFlatBag (indexSmallArray s 0) (indexSmallArray s 1)
+                      _ -> FlatBag s
+


=====================================
compiler/GHC/Data/SmallArray.hs
=====================================
@@ -16,11 +16,18 @@ module GHC.Data.SmallArray
   , mapSmallArray
   , foldMapSmallArray
   , rnfSmallArray
+
+  -- * IO Operations
+  , SmallMutableArrayIO
+  , newSmallArrayIO
+  , writeSmallArrayIO
+  , unsafeFreezeSmallArrayIO
   )
 where
 
 import GHC.Exts
 import GHC.Prelude
+import GHC.IO
 import GHC.ST
 import Control.DeepSeq
 
@@ -28,6 +35,8 @@ data SmallArray a = SmallArray (SmallArray# a)
 
 data SmallMutableArray s a = SmallMutableArray (SmallMutableArray# s a)
 
+type SmallMutableArrayIO a = SmallMutableArray RealWorld a
+
 newSmallArray
   :: Int  -- ^ size
   -> a    -- ^ initial contents
@@ -37,6 +46,9 @@ newSmallArray
 newSmallArray (I# sz) x s = case newSmallArray# sz x s of
   (# s', a #) -> (# s', SmallMutableArray a #)
 
+newSmallArrayIO :: Int -> a -> IO (SmallMutableArrayIO a)
+newSmallArrayIO sz x = IO $ \s -> newSmallArray sz x s
+
 writeSmallArray
   :: SmallMutableArray s a -- ^ array
   -> Int                   -- ^ index
@@ -46,6 +58,12 @@ writeSmallArray
 {-# INLINE writeSmallArray #-}
 writeSmallArray (SmallMutableArray a) (I# i) x = writeSmallArray# a i x
 
+writeSmallArrayIO :: SmallMutableArrayIO a
+                  -> Int
+                  -> a
+                  -> IO ()
+writeSmallArrayIO a ix v = IO $ \s -> (# writeSmallArray a ix v s, () #)
+
 
 -- | Copy and freeze a slice of a mutable array.
 freezeSmallArray
@@ -69,6 +87,9 @@ unsafeFreezeSmallArray (SmallMutableArray ma) s =
   case unsafeFreezeSmallArray# ma s of
     (# s', a #) -> (# s', SmallArray a #)
 
+unsafeFreezeSmallArrayIO :: SmallMutableArrayIO a -> IO (SmallArray a)
+unsafeFreezeSmallArrayIO arr = IO $ \s -> unsafeFreezeSmallArray arr s
+
 -- | Get the size of a 'SmallArray'
 sizeofSmallArray
   :: SmallArray a


=====================================
compiler/GHC/Prelude/Basic.hs
=====================================
@@ -25,6 +25,8 @@ module GHC.Prelude.Basic
   , shiftL, shiftR
   , setBit, clearBit
   , head, tail
+
+  , strictGenericLength
   ) where
 
 
@@ -130,3 +132,15 @@ head = Prelude.head
 tail :: HasCallStack => [a] -> [a]
 tail = Prelude.tail
 {-# INLINE tail #-}
+
+{- |
+The 'genericLength' function defined in base can't be specialised due to the
+NOINLINE pragma.
+
+It is also not strict in the accumulator, and strictGenericLength is not exported.
+
+See #25706 for why it is important to use a strict, specialised version.
+
+-}
+strictGenericLength :: Num a => [x] -> a
+strictGenericLength = fromIntegral . length


=====================================
compiler/GHC/StgToByteCode.hs
=====================================
@@ -70,7 +70,7 @@ import GHC.Types.Name.Env (mkNameEnv)
 import GHC.Types.Tickish
 import GHC.Types.SptEntry
 
-import Data.List ( genericReplicate, genericLength, intersperse
+import Data.List ( genericReplicate, intersperse
                  , partition, scanl', sortBy, zip4, zip6 )
 import Foreign hiding (shiftL, shiftR)
 import Control.Monad
@@ -240,11 +240,12 @@ ppBCEnv p
 -- Create a BCO and do a spot of peephole optimisation on the insns
 -- at the same time.
 mkProtoBCO
-   :: (Outputable name)
-   => Platform
-   -> Bool      -- ^ True <=> label with @BCO_NAME@ instruction
-                -- see Note [BCO_NAME]
-   -> name
+   ::
+    Platform
+   -> Maybe Module
+        -- ^ Just cur_mod <=> label with @BCO_NAME@ instruction
+        -- see Note [BCO_NAME]
+   -> Name
    -> BCInstrList
    -> Either  [CgStgAlt] (CgStgRhs)
                 -- ^ original expression; for debugging only
@@ -253,7 +254,7 @@ mkProtoBCO
    -> [StgWord] -- ^ bitmap
    -> Bool      -- ^ True <=> is a return point, rather than a function
    -> [FFIInfo]
-   -> ProtoBCO name
+   -> ProtoBCO Name
 mkProtoBCO platform _add_bco_name nm instrs_ordlist origin arity bitmap_size bitmap is_ret ffis
    = ProtoBCO {
         protoBCOName = nm,
@@ -267,9 +268,9 @@ mkProtoBCO platform _add_bco_name nm instrs_ordlist origin arity bitmap_size bit
      where
 #if MIN_VERSION_rts(1,0,3)
         maybe_add_bco_name instrs
-          | _add_bco_name = BCO_NAME str : instrs
-          where
-            str = BS.pack $ showSDocOneLine defaultSDocContext (ppr nm)
+          | Just cur_mod <- _add_bco_name =
+              let str = BS.pack $ showSDocOneLine defaultSDocContext (pprFullName cur_mod nm)
+              in BCO_NAME str : instrs
 #endif
         maybe_add_bco_name instrs = instrs
 
@@ -393,7 +394,7 @@ schemeR_wrk fvs nm original_body (args, body)
 
          -- make the arg bitmap
          bits = argBits platform (reverse (map (idArgRep platform) all_args))
-         bitmap_size = genericLength bits
+         bitmap_size = strictGenericLength bits
          bitmap = mkBitmap platform bits
      body_code <- schemeER_wrk sum_szsb_args p_init body
 
@@ -607,7 +608,7 @@ schemeE d s p (StgLet _ext binds body) = do
      platform <- targetPlatform <$> getDynFlags
      let (xs,rhss) = case binds of StgNonRec x rhs  -> ([x],[rhs])
                                    StgRec xs_n_rhss -> unzip xs_n_rhss
-         n_binds = genericLength xs
+         n_binds = strictGenericLength xs
 
          fvss  = map (fvsToEnv p') rhss
 
@@ -616,7 +617,7 @@ schemeE d s p (StgLet _ext binds body) = do
          sizes = map (\rhs_fvs -> sum (map size_w rhs_fvs)) fvss
 
          -- the arity of each rhs
-         arities = map (genericLength . fst . collect) rhss
+         arities = map (strictGenericLength . fst . collect) rhss
 
          -- This p', d' defn is safe because all the items being pushed
          -- are ptrs, so all have size 1 word.  d' and p' reflect the stack
@@ -1398,7 +1399,7 @@ Note [unboxed tuple bytecodes and tuple_BCO]
 
 tupleBCO :: Platform -> NativeCallInfo -> [(PrimRep, ByteOff)] -> [FFIInfo] -> ProtoBCO Name
 tupleBCO platform args_info args =
-  mkProtoBCO platform False invented_name body_code (Left [])
+  mkProtoBCO platform Nothing invented_name body_code (Left [])
              0{-no arity-} bitmap_size bitmap False{-is alts-}
   where
     {-
@@ -1419,7 +1420,7 @@ tupleBCO platform args_info args =
 
 primCallBCO :: Platform -> NativeCallInfo -> [(PrimRep, ByteOff)] -> [FFIInfo] -> ProtoBCO Name
 primCallBCO platform args_info args =
-  mkProtoBCO platform False invented_name body_code (Left [])
+  mkProtoBCO platform Nothing invented_name body_code (Left [])
              0{-no arity-} bitmap_size bitmap False{-is alts-}
   where
     {-
@@ -1857,7 +1858,7 @@ implement_tagToId
 implement_tagToId d s p arg names
   = assert (notNull names) $
     do (push_arg, arg_bytes) <- pushAtom d p (StgVarArg arg)
-       labels <- getLabelsBc (genericLength names)
+       labels <- getLabelsBc (strictGenericLength names)
        label_fail <- getLabelBc
        label_exit <- getLabelBc
        dflags <- getDynFlags
@@ -2359,8 +2360,12 @@ getHscEnv = BcM $ \st -> return (st, bcm_hsc_env st)
 getProfile :: BcM Profile
 getProfile = targetProfile <$> getDynFlags
 
-shouldAddBcoName :: BcM Bool
-shouldAddBcoName = gopt Opt_AddBcoName <$> getDynFlags
+shouldAddBcoName :: BcM (Maybe Module)
+shouldAddBcoName = do
+  add <- gopt Opt_AddBcoName <$> getDynFlags
+  if add
+    then Just <$> getCurrentModule
+    else return Nothing
 
 emitBc :: ([FFIInfo] -> ProtoBCO Name) -> BcM (ProtoBCO Name)
 emitBc bco


=====================================
libraries/ghc-boot/GHC/Data/SizedSeq.hs
=====================================
@@ -1,4 +1,4 @@
-{-# LANGUAGE StandaloneDeriving, DeriveGeneric #-}
+{-# LANGUAGE StandaloneDeriving, DeriveGeneric, CPP #-}
 module GHC.Data.SizedSeq
   ( SizedSeq(..)
   , emptySS
@@ -11,9 +11,12 @@ module GHC.Data.SizedSeq
 import Prelude -- See note [Why do we import Prelude here?]
 import Control.DeepSeq
 import Data.Binary
-import Data.List (genericLength)
 import GHC.Generics
 
+#if ! MIN_VERSION_base(4,20,0)
+import Data.List (foldl')
+#endif
+
 data SizedSeq a = SizedSeq {-# UNPACK #-} !Word [a]
   deriving (Generic, Show)
 
@@ -37,9 +40,9 @@ emptySS = SizedSeq 0 []
 addToSS :: SizedSeq a -> a -> SizedSeq a
 addToSS (SizedSeq n r_xs) x = SizedSeq (n+1) (x:r_xs)
 
+-- NB, important this is eta-expand so that foldl' is inlined.
 addListToSS :: SizedSeq a -> [a] -> SizedSeq a
-addListToSS (SizedSeq n r_xs) xs
-  = SizedSeq (n + genericLength xs) (reverse xs ++ r_xs)
+addListToSS s xs = foldl' addToSS s xs
 
 ssElts :: SizedSeq a -> [a]
 ssElts (SizedSeq _ r_xs) = reverse r_xs


=====================================
rts/StgCRunAsm.S
=====================================
@@ -69,7 +69,7 @@ StgReturn:
 
 	.section	.note.GNU-stack,"", at progbits
 # else // Not ELF v2
-# error Only ELF v2 supported.
+       // ELF v1 is in StgCrun.c
 # endif
 
 #elif defined(powerpc_HOST_ARCH)


=====================================
testsuite/tests/core-to-stg/T14895.stderr
=====================================
@@ -11,10 +11,10 @@ T14895.go
           GHC.Internal.Data.Either.Left e [Occ=Once1] -> wild<TagProper>;
           GHC.Internal.Data.Either.Right a1 [Occ=Once1] ->
               let {
-                sat [Occ=Once1] :: b
+                go_sat [Occ=Once1] :: b
                 [LclId] =
                     {a1, f} \u [] f a1;
-              } in  GHC.Internal.Data.Either.Right [sat];
+              } in  GHC.Internal.Data.Either.Right [go_sat];
         };
 
 


=====================================
testsuite/tests/core-to-stg/T24124.stderr
=====================================
@@ -9,16 +9,16 @@ T24124.testFun1
            T24124.StrictPair a b #)
 [GblId, Arity=3, Str=<L><L><L>, Cpr=1, Unf=OtherCon []] =
     {} \r [x y void]
-        case x of sat {
+        case x of testFun1_sat {
         __DEFAULT ->
         case
             case y of y [OS=OneShot] {
-            __DEFAULT -> T24124.MkStrictPair [sat y];
+            __DEFAULT -> T24124.MkStrictPair [testFun1_sat y];
             }
         of
-        sat
+        testFun1_sat
         {
-        __DEFAULT -> GHC.Internal.Types.MkSolo# [sat];
+        __DEFAULT -> GHC.Internal.Types.MkSolo# [testFun1_sat];
         };
         };
 


=====================================
testsuite/tests/ghci/should_run/T21052.stdout
=====================================
@@ -4,9 +4,9 @@ BCO_toplevel :: GHC.Internal.Types.IO [GHC.Internal.Types.Any]
 [LclIdX] =
     {} \u []
         let {
-          sat :: [GHC.Internal.Types.Any]
+          _sat :: [GHC.Internal.Types.Any]
           [LclId, Unf=OtherCon []] =
               :! [GHC.Internal.Tuple.() GHC.Internal.Types.[]];
-        } in  GHC.Internal.Base.returnIO sat;
+        } in  GHC.Internal.Base.returnIO _sat;
 
 


=====================================
testsuite/tests/perf/should_run/ByteCodeAsm.hs
=====================================
@@ -0,0 +1,66 @@
+module Main where
+
+import GHC.Driver.Session
+import GHC
+import qualified GHC.LanguageExtensions as LangExt
+
+import Control.Monad (forM_)
+import Control.Monad.IO.Class (liftIO)
+import System.Environment (getArgs)
+
+import GHC.ByteCode.Asm ( assembleBCO )
+import GHC.ByteCode.Instr
+import Control.Monad
+import GHC.Builtin.Names
+
+-- Testing the performance of the bytecode assembler
+
+-- A nonsensical byte-code program
+instrs = [ STKCHECK 1234
+         , PUSH_L 1
+         , PUSH_LL 1 2
+         , PUSH_LLL 2 3 4
+         , PUSH_LLL 2 3 4
+         , PUSH_LLL 2 3 4
+         , PUSH_LLL 2 3 4
+         , PUSH_LLL 2 3 4
+         , PUSH8 0
+         , PUSH16 15
+         , PUSH32 29
+         , PUSH_PAD8
+         , PUSH_APPLY_N
+         , PUSH_APPLY_V
+         , PUSH_APPLY_F
+         , PUSH_APPLY_D
+         , PUSH_APPLY_L
+         , PUSH_APPLY_P
+         , PUSH_APPLY_PP
+         , PUSH_APPLY_PPP
+         , PUSH_APPLY_PPPP
+         , PUSH_APPLY_PPPPP
+         , PUSH_APPLY_PPPPPP
+         , TESTLT_I 100 (LocalLabel 0)
+         , TESTEQ_I 100 (LocalLabel 0)
+         ]
+         ++ [ LABEL (LocalLabel n) | n <- [0..50] ]
+         ++ [ TESTEQ_I64 n (LocalLabel 49) | n <- [1243 .. 1253 + 50 ]]
+         ++ [ ENTER ]
+         ++ [ SLIDE x n | x <- [0..5], n <- [0..10] ]
+         ++ [ PUSH_G appAName | _ <- [0..100] ]
+         ++ [ PUSH_BCO fake_proto2 ]
+
+fake_proto = ProtoBCO appAName instrs [] 0 0 (Left []) []
+
+instrs2 = [ STKCHECK 77, UNPACK 4, SLIDE 0 4, ENTER ]
+
+fake_proto2 = ProtoBCO appAName instrs2 [] 0 0 (Left []) []
+
+main :: IO ()
+main = do
+  [libdir] <- getArgs
+  runGhc (Just libdir) $ do
+    dflags <- getSessionDynFlags
+    let platform = targetPlatform dflags
+
+    -- ~1s on my machine
+    liftIO $ replicateM_ 100000 (assembleBCO platform fake_proto)


=====================================
testsuite/tests/perf/should_run/all.T
=====================================
@@ -415,3 +415,11 @@ test('T18964', [collect_stats('bytes allocated', 1), only_ways(['normal'])], com
 test('T23021', [collect_stats('bytes allocated', 1), only_ways(['normal'])], compile_and_run, ['-O2'])
 test('T25055', [collect_stats('bytes allocated', 2), only_ways(['normal'])], compile_and_run, ['-O2'])
 test('T17949', [collect_stats('bytes allocated', 1), only_ways(['normal'])], compile_and_run, ['-O2'])
+
+test('ByteCodeAsm',
+               [ extra_run_opts('"' + config.libdir + '"')
+               , js_broken(22261)
+               , collect_stats('bytes allocated', 10),
+               ],
+               compile_and_run,
+               ['-package ghc'])


=====================================
testsuite/tests/simplCore/should_compile/T20040.stderr
=====================================
@@ -16,7 +16,9 @@ ifoldl' =
           Cons ipv2 ipv3 ->
               case z of z1 {
               __DEFAULT ->
-              case f z1 ipv2 of sat { __DEFAULT -> ifoldl' f sat ipv3; };
+              case f z1 ipv2 of ifoldl'_sat {
+              __DEFAULT -> ifoldl' f ifoldl'_sat ipv3;
+              };
               };
         };
 end Rec }


=====================================
testsuite/tests/simplCore/should_compile/T23083.stderr
=====================================
@@ -13,10 +13,10 @@ T23083.g :: ((GHC.Internal.Bignum.Integer.Integer -> GHC.Internal.Bignum.Integer
 T23083.g
   = \ (f [Occ=Once1!] :: (GHC.Internal.Bignum.Integer.Integer -> GHC.Internal.Bignum.Integer.Integer) -> GHC.Internal.Bignum.Integer.Integer) (h [Occ=OnceL1] :: GHC.Internal.Bignum.Integer.Integer -> GHC.Internal.Bignum.Integer.Integer) ->
       let {
-        sat [Occ=Once1] :: GHC.Internal.Bignum.Integer.Integer -> GHC.Internal.Bignum.Integer.Integer
+        g_sat [Occ=Once1] :: GHC.Internal.Bignum.Integer.Integer -> GHC.Internal.Bignum.Integer.Integer
         [LclId, Unf=OtherCon []]
-        sat = \ (eta [Occ=Once1] :: GHC.Internal.Bignum.Integer.Integer) -> case h of h1 [Occ=Once1] { __DEFAULT -> T23083.$$ @GHC.Internal.Bignum.Integer.Integer @GHC.Internal.Bignum.Integer.Integer h1 eta } } in
-      f sat
+        g_sat = \ (eta [Occ=Once1] :: GHC.Internal.Bignum.Integer.Integer) -> case h of h1 [Occ=Once1] { __DEFAULT -> T23083.$$ @GHC.Internal.Bignum.Integer.Integer @GHC.Internal.Bignum.Integer.Integer h1 eta } } in
+      f g_sat
 
 -- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
 T23083.$trModule4 :: GHC.Prim.Addr#


=====================================
testsuite/tests/simplStg/should_compile/T15226b.stderr
=====================================
@@ -8,13 +8,13 @@ T15226b.bar1
            T15226b.Str (GHC.Internal.Maybe.Maybe a) #)
 [GblId, Arity=2, Str=<L><L>, Cpr=1(, 1), Unf=OtherCon []] =
     {} \r [x void]
-        case x of sat {
+        case x of bar1_sat {
         __DEFAULT ->
         let {
-          sat [Occ=Once1] :: T15226b.Str (GHC.Internal.Maybe.Maybe a)
+          bar1_sat [Occ=Once1] :: T15226b.Str (GHC.Internal.Maybe.Maybe a)
           [LclId, Unf=OtherCon []] =
-              T15226b.Str! [sat];
-        } in  GHC.Internal.Types.MkSolo# [sat];
+              T15226b.Str! [bar1_sat];
+        } in  GHC.Internal.Types.MkSolo# [bar1_sat];
         };
 
 T15226b.bar


=====================================
testsuite/tests/simplStg/should_compile/T19717.stderr
=====================================
@@ -6,14 +6,14 @@ Foo.f :: forall {a}. a -> [GHC.Internal.Maybe.Maybe a]
         case x of x1 {
         __DEFAULT ->
         let {
-          sat [Occ=Once1] :: GHC.Internal.Maybe.Maybe a
+          f_sat [Occ=Once1] :: GHC.Internal.Maybe.Maybe a
           [LclId, Unf=OtherCon []] =
               GHC.Internal.Maybe.Just! [x1]; } in
         let {
-          sat [Occ=Once1] :: [GHC.Internal.Maybe.Maybe a]
+          f_sat [Occ=Once1] :: [GHC.Internal.Maybe.Maybe a]
           [LclId, Unf=OtherCon []] =
-              :! [sat GHC.Internal.Types.[]];
-        } in  : [sat sat];
+              :! [f_sat GHC.Internal.Types.[]];
+        } in  : [f_sat f_sat];
         };
 
 


=====================================
utils/ghc-toolchain/src/GHC/Toolchain/ParseTriple.hs
=====================================
@@ -36,6 +36,7 @@ parseArch :: Cc -> String -> M Arch
 parseArch cc arch =
     case arch of
       "i386" -> pure ArchX86
+      "i686" -> pure ArchX86
       "x86_64" -> pure ArchX86_64
       "amd64" -> pure ArchX86_64
       "powerpc" -> pure ArchPPC



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/7c9d6afe6a1e9bec12c5f2db226aa68b84ecc63c...890bec99b78a139565eac096e18f536ba7417029

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/7c9d6afe6a1e9bec12c5f2db226aa68b84ecc63c...890bec99b78a139565eac096e18f536ba7417029
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20250205/bb9db2af/attachment-0001.html>


More information about the ghc-commits mailing list