[Git][ghc/ghc][wip/andreask/9.10-backports] 4 commits: Implements MO_S_Mul2 and MO_U_Mul2 using the UMULH, UMULL and SMULH instructions for AArch64
Andreas Klebinger (@AndreasK)
gitlab at gitlab.haskell.org
Mon Jan 13 16:42:55 UTC 2025
Andreas Klebinger pushed to branch wip/andreask/9.10-backports at Glasgow Haskell Compiler / GHC
Commits:
ba97aa2d by Alex Mason at 2025-01-13T15:01:42+01:00
Implements MO_S_Mul2 and MO_U_Mul2 using the UMULH, UMULL and SMULH instructions for AArch64
Also adds a test for MO_S_Mul2
(cherry picked from commit dbdf1995956a7457c34b6895c67ef48f6c8384f2)
- - - - -
52812e1c by Matthew Pickering at 2025-01-13T15:04:41+01:00
Bump os-string submodule to 2.0.2.2
Closes #24786
(cherry picked from commit 0528509028ef6c4d80d47aad9fd80de6c662c8a2)
- - - - -
a085f505 by Alex Mason at 2025-01-13T15:05:55+01:00
Add AArch64 CLZ, CTZ, RBIT primop implementations.
Adds support for emitting the clz and rbit instructions, which are
used by GHC.Prim.clz*#, GHC.Prim.ctz*# and GHC.Prim.bitReverse*#.
(cherry picked from commit 71010381f4270966de334193ab2bfc67f8524212)
- - - - -
6faa8336 by Andreas Klebinger at 2025-01-13T15:06:40+01:00
GHCi interpreter: Tag constructor closures when possible.
When evaluating PUSH_G try to tag the reference we are pushing if it's a
constructor. This is potentially helpful for performance and required to
fix #24870.
(cherry picked from commit 1bfa91115b8320ed99a5e946147528e21ca4f3e1)
- - - - -
18 changed files:
- compiler/GHC/ByteCode/Instr.hs
- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/AArch64/Instr.hs
- compiler/GHC/CmmToAsm/AArch64/Ppr.hs
- compiler/GHC/Driver/Config/StgToCmm.hs
- compiler/GHC/StgToCmm/Config.hs
- compiler/GHC/StgToCmm/Prim.hs
- libraries/os-string
- rts/Interpreter.c
- + testsuite/tests/codeGen/should_run/CtzClz0.hs
- testsuite/tests/codeGen/should_run/all.T
- testsuite/tests/numeric/should_run/all.T
- + testsuite/tests/numeric/should_run/mul2int.hs
- + testsuite/tests/numeric/should_run/mul2int.stdout
- + testsuite/tests/numeric/should_run/mul2int.stdout-ws-32
- + testsuite/tests/th/should_compile/T24870/Def.hs
- + testsuite/tests/th/should_compile/T24870/Use.hs
- + testsuite/tests/th/should_compile/T24870/all.T
Changes:
=====================================
compiler/GHC/ByteCode/Instr.hs
=====================================
@@ -83,7 +83,7 @@ data BCInstr
| PUSH16_W !ByteOff
| PUSH32_W !ByteOff
- -- Push a ptr (these all map to PUSH_G really)
+ -- Push a (heap) ptr (these all map to PUSH_G really)
| PUSH_G Name
| PUSH_PRIMOP PrimOp
| PUSH_BCO (ProtoBCO Name)
=====================================
compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
=====================================
@@ -1556,7 +1556,7 @@ genCCall target dest_regs arg_regs bid = do
-- pprTraceM "genCCall target" (ppr target)
-- pprTraceM "genCCall formal" (ppr dest_regs)
-- pprTraceM "genCCall actual" (ppr arg_regs)
-
+ platform <- getPlatform
case target of
-- The target :: ForeignTarget call can either
-- be a foreign procedure with an address expr
@@ -1584,7 +1584,6 @@ genCCall target dest_regs arg_regs bid = do
let (_res_hints, arg_hints) = foreignTargetHints target
arg_regs'' = zipWith (\(r, f, c) h -> (r,f,h,c)) arg_regs' arg_hints
- platform <- getPlatform
let packStack = platformOS platform == OSDarwin
(stackSpace', passRegs, passArgumentsCode) <- passArguments packStack allGpArgRegs allFpArgRegs arg_regs'' 0 [] nilOL
@@ -1625,6 +1624,270 @@ genCCall target dest_regs arg_regs bid = do
| [arg_reg] <- arg_regs, [dest_reg] <- dest_regs ->
unaryFloatOp W64 (\d x -> unitOL $ FABS d x) arg_reg dest_reg
+ PrimTarget (MO_S_Mul2 w)
+ -- Life is easier when we're working with word sized operands,
+ -- we can use SMULH to compute the high 64 bits, and dst_needed
+ -- checks if the high half's bits are all the same as the low half's
+ -- top bit.
+ | w == W64
+ , [src_a, src_b] <- arg_regs
+ -- dst_needed = did the result fit into just the low half
+ , [dst_needed, dst_hi, dst_lo] <- dest_regs
+ -> do
+ (reg_a, _format_x, code_x) <- getSomeReg src_a
+ (reg_b, _format_y, code_y) <- getSomeReg src_b
+
+ let lo = getRegisterReg platform (CmmLocal dst_lo)
+ hi = getRegisterReg platform (CmmLocal dst_hi)
+ nd = getRegisterReg platform (CmmLocal dst_needed)
+ return (
+ code_x `appOL`
+ code_y `snocOL`
+ MUL (OpReg W64 lo) (OpReg W64 reg_a) (OpReg W64 reg_b) `snocOL`
+ SMULH (OpReg W64 hi) (OpReg W64 reg_a) (OpReg W64 reg_b) `snocOL`
+ -- Are all high bits equal to the sign bit of the low word?
+ -- nd = (hi == ASR(lo,width-1)) ? 1 : 0
+ CMP (OpReg W64 hi) (OpRegShift W64 lo SASR (widthInBits w - 1)) `snocOL`
+ CSET (OpReg W64 nd) NE
+ , Nothing)
+ -- For sizes < platform width, we can just perform a multiply and shift
+ -- using the normal 64 bit multiply. Calculating the dst_needed value is
+ -- complicated a little by the need to be careful when truncation happens.
+ -- Currently this case can't be generated since
+ -- timesInt2# :: Int# -> Int# -> (# Int#, Int#, Int# #)
+ -- TODO: Should this be removed or would other primops be useful?
+ | w < W64
+ , [src_a, src_b] <- arg_regs
+ , [dst_needed, dst_hi, dst_lo] <- dest_regs
+ -> do
+ (reg_a', _format_x, code_a) <- getSomeReg src_a
+ (reg_b', _format_y, code_b) <- getSomeReg src_b
+
+ let lo = getRegisterReg platform (CmmLocal dst_lo)
+ hi = getRegisterReg platform (CmmLocal dst_hi)
+ nd = getRegisterReg platform (CmmLocal dst_needed)
+ -- Do everything in a full 64 bit registers
+ w' = platformWordWidth platform
+
+ (reg_a, code_a') <- signExtendReg w w' reg_a'
+ (reg_b, code_b') <- signExtendReg w w' reg_b'
+
+ return (
+ code_a `appOL`
+ code_b `appOL`
+ code_a' `appOL`
+ code_b' `snocOL`
+ -- the low 2w' of lo contains the full multiplication;
+ -- eg: int8 * int8 -> int16 result
+ -- so lo is in the last w of the register, and hi is in the second w.
+ SMULL (OpReg w' lo) (OpReg w' reg_a) (OpReg w' reg_b) `snocOL`
+ -- Make sure we hold onto the sign bits for dst_needed
+ ASR (OpReg w' hi) (OpReg w' lo) (OpImm (ImmInt $ widthInBits w)) `appOL`
+ -- lo can now be truncated so we can get at it's top bit easily.
+ truncateReg w' w lo `snocOL`
+ -- Note the use of CMN (compare negative), not CMP: we want to
+ -- test if the top half is negative one and the top
+ -- bit of the bottom half is positive one. eg:
+ -- hi = 0b1111_1111 (actually 64 bits)
+ -- lo = 0b1010_1111 (-81, so the result didn't need the top half)
+ -- lo' = ASR(lo,7) (second reg of SMN)
+ -- = 0b0000_0001 (theeshift gives us 1 for negative,
+ -- and 0 for positive)
+ -- hi == -lo'?
+ -- 0b1111_1111 == 0b1111_1111 (yes, top half is just overflow)
+ -- Another way to think of this is if hi + lo' == 0, which is what
+ -- CMN really is under the hood.
+ CMN (OpReg w' hi) (OpRegShift w' lo SLSR (widthInBits w - 1)) `snocOL`
+ -- Set dst_needed to 1 if hi and lo' were (negatively) equal
+ CSET (OpReg w' nd) EQ `appOL`
+ -- Finally truncate hi to drop any extraneous sign bits.
+ truncateReg w' w hi
+ , Nothing)
+ -- Can't handle > 64 bit operands
+ | otherwise -> unsupported (MO_S_Mul2 w)
+ PrimTarget (MO_U_Mul2 w)
+ -- The unsigned case is much simpler than the signed, all we need to
+ -- do is the multiplication straight into the destination registers.
+ | w == W64
+ , [src_a, src_b] <- arg_regs
+ , [dst_hi, dst_lo] <- dest_regs
+ -> do
+ (reg_a, _format_x, code_x) <- getSomeReg src_a
+ (reg_b, _format_y, code_y) <- getSomeReg src_b
+
+ let lo = getRegisterReg platform (CmmLocal dst_lo)
+ hi = getRegisterReg platform (CmmLocal dst_hi)
+ return (
+ code_x `appOL`
+ code_y `snocOL`
+ MUL (OpReg W64 lo) (OpReg W64 reg_a) (OpReg W64 reg_b) `snocOL`
+ UMULH (OpReg W64 hi) (OpReg W64 reg_a) (OpReg W64 reg_b)
+ , Nothing)
+ -- For sizes < platform width, we can just perform a multiply and shift
+ -- Need to be careful to truncate the low half, but the upper half should be
+ -- be ok if the invariant in [Signed arithmetic on AArch64] is maintained.
+ -- Currently this case can't be produced by the compiler since
+ -- timesWord2# :: Word# -> Word# -> (# Word#, Word# #)
+ -- TODO: Remove? Or would the extra primop be useful for avoiding the extra
+ -- steps needed to do this in userland?
+ | w < W64
+ , [src_a, src_b] <- arg_regs
+ , [dst_hi, dst_lo] <- dest_regs
+ -> do
+ (reg_a, _format_x, code_x) <- getSomeReg src_a
+ (reg_b, _format_y, code_y) <- getSomeReg src_b
+
+ let lo = getRegisterReg platform (CmmLocal dst_lo)
+ hi = getRegisterReg platform (CmmLocal dst_hi)
+ w' = opRegWidth w
+ return (
+ code_x `appOL`
+ code_y `snocOL`
+ -- UMULL: Xd = Wa * Wb with 64 bit result
+ -- W64 inputs should have been caught by case above
+ UMULL (OpReg W64 lo) (OpReg w' reg_a) (OpReg w' reg_b) `snocOL`
+ -- Extract and truncate high result
+ -- hi[w:0] = lo[2w:w]
+ UBFX (OpReg W64 hi) (OpReg W64 lo)
+ (OpImm (ImmInt $ widthInBits w)) -- lsb
+ (OpImm (ImmInt $ widthInBits w)) -- width to extract
+ `appOL`
+ truncateReg W64 w lo
+ , Nothing)
+ | otherwise -> unsupported (MO_U_Mul2 w)
+ PrimTarget (MO_Clz w)
+ | w == W64 || w == W32
+ , [src] <- arg_regs
+ , [dst] <- dest_regs
+ -> do
+ (reg_a, _format_x, code_x) <- getSomeReg src
+ let dst_reg = getRegisterReg platform (CmmLocal dst)
+ return (
+ code_x `snocOL`
+ CLZ (OpReg w dst_reg) (OpReg w reg_a)
+ , Nothing)
+ | w == W16
+ , [src] <- arg_regs
+ , [dst] <- dest_regs
+ -> do
+ (reg_a, _format_x, code_x) <- getSomeReg src
+ let dst' = getRegisterReg platform (CmmLocal dst)
+ r n = OpReg W32 n
+ imm n = OpImm (ImmInt n)
+ {- dst = clz(x << 16 | 0x0000_8000) -}
+ return (
+ code_x `appOL` toOL
+ [ LSL (r dst') (r reg_a) (imm 16)
+ , ORR (r dst') (r dst') (imm 0x00008000)
+ , CLZ (r dst') (r dst')
+ ]
+ , Nothing)
+ | w == W8
+ , [src] <- arg_regs
+ , [dst] <- dest_regs
+ -> do
+ (reg_a, _format_x, code_x) <- getSomeReg src
+ let dst' = getRegisterReg platform (CmmLocal dst)
+ r n = OpReg W32 n
+ imm n = OpImm (ImmInt n)
+ {- dst = clz(x << 24 | 0x0080_0000) -}
+ return (
+ code_x `appOL` toOL
+ [ LSL (r dst') (r reg_a) (imm 24)
+ , ORR (r dst') (r dst') (imm 0x00800000)
+ , CLZ (r dst') (r dst')
+ ]
+ , Nothing)
+ | otherwise -> unsupported (MO_Clz w)
+ PrimTarget (MO_Ctz w)
+ | w == W64 || w == W32
+ , [src] <- arg_regs
+ , [dst] <- dest_regs
+ -> do
+ (reg_a, _format_x, code_x) <- getSomeReg src
+ let dst_reg = getRegisterReg platform (CmmLocal dst)
+ return (
+ code_x `snocOL`
+ RBIT (OpReg w dst_reg) (OpReg w reg_a) `snocOL`
+ CLZ (OpReg w dst_reg) (OpReg w dst_reg)
+ , Nothing)
+ | w == W16
+ , [src] <- arg_regs
+ , [dst] <- dest_regs
+ -> do
+ (reg_a, _format_x, code_x) <- getSomeReg src
+ let dst' = getRegisterReg platform (CmmLocal dst)
+ r n = OpReg W32 n
+ imm n = OpImm (ImmInt n)
+ {- dst = clz(reverseBits(x) | 0x0000_8000) -}
+ return (
+ code_x `appOL` toOL
+ [ RBIT (r dst') (r reg_a)
+ , ORR (r dst') (r dst') (imm 0x00008000)
+ , CLZ (r dst') (r dst')
+ ]
+ , Nothing)
+ | w == W8
+ , [src] <- arg_regs
+ , [dst] <- dest_regs
+ -> do
+ (reg_a, _format_x, code_x) <- getSomeReg src
+ let dst' = getRegisterReg platform (CmmLocal dst)
+ r n = OpReg W32 n
+ imm n = OpImm (ImmInt n)
+ {- dst = clz(reverseBits(x) | 0x0080_0000) -}
+ return (
+ code_x `appOL` toOL
+ [ RBIT (r dst') (r reg_a)
+ , ORR (r dst') (r dst') (imm 0x00800000)
+ , CLZ (r dst') (r dst')
+ ]
+ , Nothing)
+ | otherwise -> unsupported (MO_Ctz w)
+ PrimTarget (MO_BRev w)
+ | w == W64 || w == W32
+ , [src] <- arg_regs
+ , [dst] <- dest_regs
+ -> do
+ (reg_a, _format_x, code_x) <- getSomeReg src
+ let dst_reg = getRegisterReg platform (CmmLocal dst)
+ return (
+ code_x `snocOL`
+ RBIT (OpReg w dst_reg) (OpReg w reg_a)
+ , Nothing)
+ | w == W16
+ , [src] <- arg_regs
+ , [dst] <- dest_regs
+ -> do
+ (reg_a, _format_x, code_x) <- getSomeReg src
+ let dst' = getRegisterReg platform (CmmLocal dst)
+ r n = OpReg W32 n
+ imm n = OpImm (ImmInt n)
+ {- dst = reverseBits32(x << 16) -}
+ return (
+ code_x `appOL` toOL
+ [ LSL (r dst') (r reg_a) (imm 16)
+ , RBIT (r dst') (r dst')
+ ]
+ , Nothing)
+ | w == W8
+ , [src] <- arg_regs
+ , [dst] <- dest_regs
+ -> do
+ (reg_a, _format_x, code_x) <- getSomeReg src
+ let dst' = getRegisterReg platform (CmmLocal dst)
+ r n = OpReg W32 n
+ imm n = OpImm (ImmInt n)
+ {- dst = reverseBits32(x << 24) -}
+ return (
+ code_x `appOL` toOL
+ [ LSL (r dst') (r reg_a) (imm 24)
+ , RBIT (r dst') (r dst')
+ ]
+ , Nothing)
+ | otherwise -> unsupported (MO_BRev w)
+
+
-- or a possibly side-effecting machine operation
-- mop :: CallishMachOp (see GHC.Cmm.MachOp)
PrimTarget mop -> do
@@ -1714,7 +1977,6 @@ genCCall target dest_regs arg_regs bid = do
-- Arithmatic
-- These are not supported on X86, so I doubt they are used much.
- MO_S_Mul2 _w -> unsupported mop
MO_S_QuotRem _w -> unsupported mop
MO_U_QuotRem _w -> unsupported mop
MO_U_QuotRem2 _w -> unsupported mop
@@ -1723,7 +1985,6 @@ genCCall target dest_regs arg_regs bid = do
MO_SubWordC _w -> unsupported mop
MO_AddIntC _w -> unsupported mop
MO_SubIntC _w -> unsupported mop
- MO_U_Mul2 _w -> unsupported mop
-- Memory Ordering
MO_AcquireFence -> return (unitOL DMBISH, Nothing)
@@ -1751,10 +2012,7 @@ genCCall target dest_regs arg_regs bid = do
MO_PopCnt w -> mkCCall (popCntLabel w)
MO_Pdep w -> mkCCall (pdepLabel w)
MO_Pext w -> mkCCall (pextLabel w)
- MO_Clz w -> mkCCall (clzLabel w)
- MO_Ctz w -> mkCCall (ctzLabel w)
MO_BSwap w -> mkCCall (bSwapLabel w)
- MO_BRev w -> mkCCall (bRevLabel w)
-- -- Atomic read-modify-write.
MO_AtomicRead w ord
=====================================
compiler/GHC/CmmToAsm/AArch64/Instr.hs
=====================================
@@ -79,11 +79,14 @@ regUsageOfInstr platform instr = case instr of
-- 1. Arithmetic Instructions ------------------------------------------------
ADD dst src1 src2 -> usage (regOp src1 ++ regOp src2, regOp dst)
CMP l r -> usage (regOp l ++ regOp r, [])
+ CMN l r -> usage (regOp l ++ regOp r, [])
MSUB dst src1 src2 src3 -> usage (regOp src1 ++ regOp src2 ++ regOp src3, regOp dst)
MUL dst src1 src2 -> usage (regOp src1 ++ regOp src2, regOp dst)
NEG dst src -> usage (regOp src, regOp dst)
SMULH dst src1 src2 -> usage (regOp src1 ++ regOp src2, regOp dst)
SMULL dst src1 src2 -> usage (regOp src1 ++ regOp src2, regOp dst)
+ UMULH dst src1 src2 -> usage (regOp src1 ++ regOp src2, regOp dst)
+ UMULL dst src1 src2 -> usage (regOp src1 ++ regOp src2, regOp dst)
SDIV dst src1 src2 -> usage (regOp src1 ++ regOp src2, regOp dst)
SUB dst src1 src2 -> usage (regOp src1 ++ regOp src2, regOp dst)
UDIV dst src1 src2 -> usage (regOp src1 ++ regOp src2, regOp dst)
@@ -97,6 +100,8 @@ regUsageOfInstr platform instr = case instr of
UXTB dst src -> usage (regOp src, regOp dst)
SXTH dst src -> usage (regOp src, regOp dst)
UXTH dst src -> usage (regOp src, regOp dst)
+ CLZ dst src -> usage (regOp src, regOp dst)
+ RBIT dst src -> usage (regOp src, regOp dst)
-- 3. Logical and Move Instructions ------------------------------------------
AND dst src1 src2 -> usage (regOp src1 ++ regOp src2, regOp dst)
ASR dst src1 src2 -> usage (regOp src1 ++ regOp src2, regOp dst)
@@ -136,7 +141,8 @@ regUsageOfInstr platform instr = case instr of
FMA _ dst src1 src2 src3 ->
usage (regOp src1 ++ regOp src2 ++ regOp src3, regOp dst)
- _ -> panic $ "regUsageOfInstr: " ++ instrCon instr
+ LOCATION{} -> panic $ "regUsageOfInstr: " ++ instrCon instr
+ NEWBLOCK{} -> panic $ "regUsageOfInstr: " ++ instrCon instr
where
-- filtering the usage is necessary, otherwise the register
@@ -209,11 +215,14 @@ patchRegsOfInstr instr env = case instr of
-- 1. Arithmetic Instructions ----------------------------------------------
ADD o1 o2 o3 -> ADD (patchOp o1) (patchOp o2) (patchOp o3)
CMP o1 o2 -> CMP (patchOp o1) (patchOp o2)
+ CMN o1 o2 -> CMN (patchOp o1) (patchOp o2)
MSUB o1 o2 o3 o4 -> MSUB (patchOp o1) (patchOp o2) (patchOp o3) (patchOp o4)
MUL o1 o2 o3 -> MUL (patchOp o1) (patchOp o2) (patchOp o3)
NEG o1 o2 -> NEG (patchOp o1) (patchOp o2)
SMULH o1 o2 o3 -> SMULH (patchOp o1) (patchOp o2) (patchOp o3)
SMULL o1 o2 o3 -> SMULL (patchOp o1) (patchOp o2) (patchOp o3)
+ UMULH o1 o2 o3 -> UMULH (patchOp o1) (patchOp o2) (patchOp o3)
+ UMULL o1 o2 o3 -> UMULL (patchOp o1) (patchOp o2) (patchOp o3)
SDIV o1 o2 o3 -> SDIV (patchOp o1) (patchOp o2) (patchOp o3)
SUB o1 o2 o3 -> SUB (patchOp o1) (patchOp o2) (patchOp o3)
UDIV o1 o2 o3 -> UDIV (patchOp o1) (patchOp o2) (patchOp o3)
@@ -227,6 +236,8 @@ patchRegsOfInstr instr env = case instr of
UXTB o1 o2 -> UXTB (patchOp o1) (patchOp o2)
SXTH o1 o2 -> SXTH (patchOp o1) (patchOp o2)
UXTH o1 o2 -> UXTH (patchOp o1) (patchOp o2)
+ CLZ o1 o2 -> CLZ (patchOp o1) (patchOp o2)
+ RBIT o1 o2 -> RBIT (patchOp o1) (patchOp o2)
-- 3. Logical and Move Instructions ----------------------------------------
AND o1 o2 o3 -> AND (patchOp o1) (patchOp o2) (patchOp o3)
@@ -268,7 +279,8 @@ patchRegsOfInstr instr env = case instr of
FMA s o1 o2 o3 o4 ->
FMA s (patchOp o1) (patchOp o2) (patchOp o3) (patchOp o4)
- _ -> panic $ "patchRegsOfInstr: " ++ instrCon instr
+ NEWBLOCK{} -> panic $ "patchRegsOfInstr: " ++ instrCon instr
+ LOCATION{} -> panic $ "patchRegsOfInstr: " ++ instrCon instr
where
patchOp :: Operand -> Operand
patchOp (OpReg w r) = OpReg w (env r)
@@ -546,6 +558,7 @@ data Instr
-- | ADR ...
-- | ADRP ...
| CMP Operand Operand -- rd - op2
+ | CMN Operand Operand -- rd + op2
-- | MADD ...
-- | MNEG ...
| MSUB Operand Operand Operand Operand -- rd = ra - rn × rm
@@ -568,8 +581,8 @@ data Instr
-- | UMADDL ... -- Xd = Xa + Wn × Wm
-- | UMNEGL ... -- Xd = - Wn × Wm
-- | UMSUBL ... -- Xd = Xa - Wn × Wm
- -- | UMULH ... -- Xd = (Xn × Xm)_127:64
- -- | UMULL ... -- Xd = Wn × Wm
+ | UMULH Operand Operand Operand -- Xd = (Xn × Xm)_127:64
+ | UMULL Operand Operand Operand -- Xd = Wn × Wm
-- 2. Bit Manipulation Instructions ----------------------------------------
| SBFM Operand Operand Operand Operand -- rd = rn[i,j]
@@ -582,6 +595,8 @@ data Instr
-- Signed/Unsigned bitfield extract
| SBFX Operand Operand Operand Operand -- rd = rn[i,j]
| UBFX Operand Operand Operand Operand -- rd = rn[i,j]
+ | CLZ Operand Operand -- rd = countLeadingZeros(rn)
+ | RBIT Operand Operand -- rd = reverseBits(rn)
-- 3. Logical and Move Instructions ----------------------------------------
| AND Operand Operand Operand -- rd = rn & op2
@@ -650,18 +665,23 @@ instrCon i =
POP_STACK_FRAME{} -> "POP_STACK_FRAME"
ADD{} -> "ADD"
CMP{} -> "CMP"
+ CMN{} -> "CMN"
MSUB{} -> "MSUB"
MUL{} -> "MUL"
NEG{} -> "NEG"
SDIV{} -> "SDIV"
SMULH{} -> "SMULH"
SMULL{} -> "SMULL"
+ UMULH{} -> "UMULH"
+ UMULL{} -> "UMULL"
SUB{} -> "SUB"
UDIV{} -> "UDIV"
SBFM{} -> "SBFM"
UBFM{} -> "UBFM"
SBFX{} -> "SBFX"
UBFX{} -> "UBFX"
+ CLZ{} -> "CLZ"
+ RBIT{} -> "RBIT"
AND{} -> "AND"
ASR{} -> "ASR"
EOR{} -> "EOR"
=====================================
compiler/GHC/CmmToAsm/AArch64/Ppr.hs
=====================================
@@ -372,12 +372,15 @@ pprInstr platform instr = case instr of
CMP o1 o2
| isFloatOp o1 && isFloatOp o2 -> op2 (text "\tfcmp") o1 o2
| otherwise -> op2 (text "\tcmp") o1 o2
+ CMN o1 o2 -> op2 (text "\tcmn") o1 o2
MSUB o1 o2 o3 o4 -> op4 (text "\tmsub") o1 o2 o3 o4
MUL o1 o2 o3
| isFloatOp o1 && isFloatOp o2 && isFloatOp o3 -> op3 (text "\tfmul") o1 o2 o3
| otherwise -> op3 (text "\tmul") o1 o2 o3
SMULH o1 o2 o3 -> op3 (text "\tsmulh") o1 o2 o3
SMULL o1 o2 o3 -> op3 (text "\tsmull") o1 o2 o3
+ UMULH o1 o2 o3 -> op3 (text "\tumulh") o1 o2 o3
+ UMULL o1 o2 o3 -> op3 (text "\tumull") o1 o2 o3
NEG o1 o2
| isFloatOp o1 && isFloatOp o2 -> op2 (text "\tfneg") o1 o2
| otherwise -> op2 (text "\tneg") o1 o2
@@ -393,6 +396,8 @@ pprInstr platform instr = case instr of
-- 2. Bit Manipulation Instructions ------------------------------------------
SBFM o1 o2 o3 o4 -> op4 (text "\tsbfm") o1 o2 o3 o4
UBFM o1 o2 o3 o4 -> op4 (text "\tubfm") o1 o2 o3 o4
+ CLZ o1 o2 -> op2 (text "\tclz") o1 o2
+ RBIT o1 o2 -> op2 (text "\trbit") o1 o2
-- signed and unsigned bitfield extract
SBFX o1 o2 o3 o4 -> op4 (text "\tsbfx") o1 o2 o3 o4
UBFX o1 o2 o3 o4 -> op4 (text "\tubfx") o1 o2 o3 o4
=====================================
compiler/GHC/Driver/Config/StgToCmm.hs
=====================================
@@ -76,7 +76,8 @@ initStgToCmmConfig dflags mod = StgToCmmConfig
| otherwise
-> const True
- , stgToCmmAllowIntMul2Instr = (ncg && x86ish) || llvm
+ , stgToCmmAllowIntMul2Instr = (ncg && (x86ish || aarch64)) || llvm
+ , stgToCmmAllowWordMul2Instr = (ncg && (x86ish || ppc || aarch64)) || llvm
-- SIMD flags
, stgToCmmVecInstrsErr = vec_err
, stgToCmmAvx = isAvxEnabled dflags
@@ -92,6 +93,9 @@ initStgToCmmConfig dflags mod = StgToCmmConfig
JSPrimitives -> (False, False)
NcgPrimitives -> (True, False)
LlvmPrimitives -> (False, True)
+ aarch64 = case platformArch platform of
+ ArchAArch64 -> True
+ _ -> False
x86ish = case platformArch platform of
ArchX86 -> True
ArchX86_64 -> True
=====================================
compiler/GHC/StgToCmm/Config.hs
=====================================
@@ -70,6 +70,7 @@ data StgToCmmConfig = StgToCmmConfig
, stgToCmmAllowQuotRem2 :: !Bool -- ^ Allowed to generate QuotRem
, stgToCmmAllowExtendedAddSubInstrs :: !Bool -- ^ Allowed to generate AddWordC, SubWordC, Add2, etc.
, stgToCmmAllowIntMul2Instr :: !Bool -- ^ Allowed to generate IntMul2 instruction
+ , stgToCmmAllowWordMul2Instr :: !Bool -- ^ Allowed to generate WordMul2 instruction
, stgToCmmAllowFMAInstr :: FMASign -> Bool -- ^ Allowed to generate FMA instruction
, stgToCmmTickyAP :: !Bool -- ^ Disable use of precomputed standard thunks.
------------------------------ SIMD flags ------------------------------------
=====================================
compiler/GHC/StgToCmm/Prim.hs
=====================================
@@ -1623,7 +1623,7 @@ emitPrimOp cfg primop =
else Right genericIntSubCOp
WordMul2Op -> \args -> opCallishHandledLater args $
- if allowExtAdd
+ if allowWord2Mul
then Left (MO_U_Mul2 (wordWidth platform))
else Right genericWordMul2Op
@@ -1850,6 +1850,7 @@ emitPrimOp cfg primop =
allowQuotRem2 = stgToCmmAllowQuotRem2 cfg
allowExtAdd = stgToCmmAllowExtendedAddSubInstrs cfg
allowInt2Mul = stgToCmmAllowIntMul2Instr cfg
+ allowWord2Mul = stgToCmmAllowWordMul2Instr cfg
allowFMA = stgToCmmAllowFMAInstr cfg
=====================================
libraries/os-string
=====================================
@@ -1 +1 @@
-Subproject commit 8e13019854e1497cb83581bbcab25e763fcc4d4b
+Subproject commit e1dd3bcfab56a6616c73ee9220de425d55545bc8
=====================================
rts/Interpreter.c
=====================================
@@ -4,6 +4,30 @@
* Copyright (c) The GHC Team, 1994-2002.
* ---------------------------------------------------------------------------*/
+/*
+Note [CBV Functions and the interpreter]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When the byte code interpreter loads a reference to a value it often
+ends up as a non-tagged pointers *especially* if we already know a value
+is a certain constructor and therefore don't perform an eval on the reference.
+This causes friction with CBV functions which assume
+their value arguments are properly tagged by the caller.
+
+In order to ensure CBV functions still get passed tagged functions we have
+three options:
+a) Special case the interpreter behaviour into the tag inference analysis.
+ If we assume the interpreter can't properly tag value references the STG passes
+ would then wrap such calls in appropriate evals which are executed at runtime.
+ This would ensure tags by doing additional evals at runtime.
+b) When the interpreter pushes references for known constructors instead of
+ pushing the objects address add the tag to the value pushed. This is what
+ the NCG backends do.
+c) When the interpreter pushes a reference inspect the closure of the object
+ and apply the appropriate tag at runtime.
+
+For now we use approach c). Mostly because it's easiest to implement. We also don't
+tag functions as tag inference currently doesn't rely on those being properly tagged.
+*/
#include "rts/PosixSource.h"
#include "Rts.h"
@@ -1306,7 +1330,42 @@ run_BCO:
case bci_PUSH_G: {
W_ o1 = BCO_GET_LARGE_ARG;
- SpW(-1) = BCO_PTR(o1);
+ StgClosure *tagged_obj = (StgClosure*) BCO_PTR(o1);
+
+ tag_push_g:
+ ASSERT(LOOKS_LIKE_CLOSURE_PTR((StgClosure*) tagged_obj));
+ // Here we make sure references we push are tagged.
+ // See Note [CBV Functions and the interpreter] in Info.hs
+
+ //Safe some memory reads if we already have a tag.
+ if(GET_CLOSURE_TAG(tagged_obj) == 0) {
+ StgClosure *obj = UNTAG_CLOSURE(tagged_obj);
+ switch ( get_itbl(obj)->type ) {
+ case IND:
+ case IND_STATIC:
+ {
+ tagged_obj = ACQUIRE_LOAD(&((StgInd*)obj)->indirectee);
+ goto tag_push_g;
+ }
+ case CONSTR:
+ case CONSTR_1_0:
+ case CONSTR_0_1:
+ case CONSTR_2_0:
+ case CONSTR_1_1:
+ case CONSTR_0_2:
+ case CONSTR_NOCAF:
+ // The value is already evaluated, so we can just return it. However,
+ // before we do, we MUST ensure that the pointer is tagged, because we
+ // might return to a native `case` expression, which assumes the returned
+ // pointer is tagged so it can use the tag to select an alternative.
+ tagged_obj = tagConstr(obj);
+ break;
+ default:
+ break;
+ }
+ }
+
+ SpW(-1) = (W_) tagged_obj;
Sp_subW(1);
goto nextInsn;
}
=====================================
testsuite/tests/codeGen/should_run/CtzClz0.hs
=====================================
@@ -0,0 +1,27 @@
+{-# LANGUAGE CPP #-}
+{-# LANGUAGE MagicHash #-}
+
+module Main where
+
+import GHC.Exts
+import Control.Monad
+
+#include <MachDeps.h>
+
+{-# OPAQUE x #-} -- needed to avoid triggering constant folding
+x :: Word
+x = 0
+
+main :: IO ()
+main = do
+ let !(W# w) = x
+
+ guard (W# (ctz# w) == WORD_SIZE_IN_BITS)
+ guard (W# (ctz8# w) == 8)
+ guard (W# (ctz16# w) == 16)
+ guard (W# (ctz32# w) == 32)
+
+ guard (W# (clz# w) == WORD_SIZE_IN_BITS)
+ guard (W# (clz8# w) == 8)
+ guard (W# (clz16# w) == 16)
+ guard (W# (clz32# w) == 32)
=====================================
testsuite/tests/codeGen/should_run/all.T
=====================================
@@ -249,3 +249,4 @@ test('T24664b', normal, compile_and_run, ['-O'])
test('T23034', [req_c
, when(arch('x86_64') and opsys('darwin'), expect_broken(25018))
], compile_and_run, ['-O2 T23034_c.c'])
+test('CtzClz0', normal, compile_and_run, [''])
=====================================
testsuite/tests/numeric/should_run/all.T
=====================================
@@ -50,6 +50,7 @@ test('T4383', normal, compile_and_run, [''])
test('add2', normal, compile_and_run, ['-fobject-code'])
test('mul2', normal, compile_and_run, ['-fobject-code'])
+test('mul2int', normal, compile_and_run, ['-fobject-code'])
test('quotRem2', normal, compile_and_run, ['-fobject-code'])
test('T5863', normal, compile_and_run, [''])
=====================================
testsuite/tests/numeric/should_run/mul2int.hs
=====================================
@@ -0,0 +1,35 @@
+{-# LANGUAGE MagicHash, UnboxedTuples #-}
+
+import GHC.Exts
+import Data.Bits
+
+main :: IO ()
+main = do g 5 6
+ g (-5) 6
+ g 0x7ECA71DBFF1B7D8C 49
+ g (-0x7ECA71DBFF1B7D8C) 49
+ g 0x7ECA71DBFF1B7D8C 0x7E0EC51DFD94FE35
+ g 0x7ECA71DBFF1B7D8C (-0x7E0EC51DFD94FE35)
+
+
+g :: Int -> Int -> IO ()
+g wx@(I# x) wy@(I# y)
+ = do putStrLn "-----"
+ putStrLn ("Doing " ++ show wx ++ " * " ++ show wy)
+ case x `timesInt2#` y of
+ (# n, h, l #) ->
+ do let wh = I# h
+ wl = I# l
+ wlw = W# (int2Word# l)
+ wn = I# n
+ r | wn == 1 = shiftL (fromIntegral wh) (finiteBitSize wh)
+ + fromIntegral wlw
+ | otherwise = fromIntegral wl
+
+ putStrLn ("High: " ++ show wh)
+ putStrLn ("Low: " ++ show wl)
+ putStrLn ("Needed: " ++ show wn)
+ putStrLn ("Result: " ++ show (r :: Integer))
+ putStrLn ("Should be: " ++ show (fromIntegral wx * fromIntegral wy :: Integer))
+
+
=====================================
testsuite/tests/numeric/should_run/mul2int.stdout
=====================================
@@ -0,0 +1,42 @@
+-----
+Doing 5 * 6
+High: 0
+Low: 30
+Needed: 0
+Result: 30
+Should be: 30
+-----
+Doing -5 * 6
+High: -1
+Low: -30
+Needed: 0
+Result: -30
+Should be: -30
+-----
+Doing 9136239983766240652 * 49
+High: 24
+Low: 4953901435516553164
+Needed: 1
+Result: 447675759204545791948
+Should be: 447675759204545791948
+-----
+Doing -9136239983766240652 * 49
+High: -25
+Low: -4953901435516553164
+Needed: 1
+Result: -447675759204545791948
+Should be: -447675759204545791948
+-----
+Doing 9136239983766240652 * 9083414231051992629
+High: 4498802171008813567
+Low: 3355592377236579836
+Needed: 1
+Result: 82988252286848496451678442784944154108
+Should be: 82988252286848496451678442784944154108
+-----
+Doing 9136239983766240652 * -9083414231051992629
+High: -4498802171008813568
+Low: -3355592377236579836
+Needed: 1
+Result: -82988252286848496451678442784944154108
+Should be: -82988252286848496451678442784944154108
=====================================
testsuite/tests/numeric/should_run/mul2int.stdout-ws-32
=====================================
@@ -0,0 +1,42 @@
+-----
+Doing 5 * 6
+High: 0
+Low: 30
+Needed: 0
+Result: 30
+Should be: 30
+-----
+Doing -5 * 6
+High: -1
+Low: -30
+Needed: 0
+Result: -30
+Should be: -30
+-----
+Doing -14975604 * 49
+High: -1
+Low: -733804596
+Needed: 0
+Result: -733804596
+Should be: -733804596
+-----
+Doing 14975604 * 49
+High: 0
+Low: 733804596
+Needed: 0
+Result: 733804596
+Should be: 733804596
+-----
+Doing -14975604 * -40567243
+High: 141449
+Low: 137487868
+Needed: 1
+Result: 607518966539772
+Should be: 607518966539772
+-----
+Doing -14975604 * 40567243
+High: -141450
+Low: -137487868
+Needed: 1
+Result: -607518966539772
+Should be: -607518966539772
=====================================
testsuite/tests/th/should_compile/T24870/Def.hs
=====================================
@@ -0,0 +1,9 @@
+{-# LANGUAGE TemplateHaskell #-}
+
+module SDef where
+
+{-# NOINLINE aValue #-}
+aValue = True
+
+{-# NOINLINE aStrictFunction #-}
+aStrictFunction !x = [| x |]
=====================================
testsuite/tests/th/should_compile/T24870/Use.hs
=====================================
@@ -0,0 +1,9 @@
+{-# LANGUAGE TemplateHaskell #-}
+
+module SUse where
+
+import qualified Language.Haskell.TH.Syntax as TH
+import SDef
+import GHC.Exts
+
+bar = $( inline aStrictFunction aValue )
=====================================
testsuite/tests/th/should_compile/T24870/all.T
=====================================
@@ -0,0 +1,6 @@
+# The interpreter must uphold tagging invariants, and failed to do so in #24870
+# We test this here by having the interpreter calls a strict worker function
+# with a reference to a value it constructed.
+# See also Note [CBV Functions and the interpreter]
+test('T24870', [extra_files(['Def.hs', 'Use.hs']), req_th],
+ multimod_compile, ['Def Use', '-dtag-inference-checks -v0'])
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/a92613c21bc0d728d87fb48eb0a3f9bff20e04f5...6faa8336f63b1635e6a04564a8ec29ab7ce42a31
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/a92613c21bc0d728d87fb48eb0a3f9bff20e04f5...6faa8336f63b1635e6a04564a8ec29ab7ce42a31
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20250113/9f8091eb/attachment-0001.html>
More information about the ghc-commits
mailing list