[commit: ghc] master: x86: zero extend the result of 16-bit popcnt instructions (#9435) (6415191)
git at git.haskell.org
git at git.haskell.org
Tue Aug 12 15:34:08 UTC 2014
Repository : ssh://git@git.haskell.org/ghc
On branch : master
Link : http://ghc.haskell.org/trac/ghc/changeset/64151913f1ed32ecfe17fcc40f7adc6cbfbb0bc1/ghc
>---------------------------------------------------------------
commit 64151913f1ed32ecfe17fcc40f7adc6cbfbb0bc1
Author: Reid Barton <rwbarton at gmail.com>
Date: Tue Aug 12 11:11:46 2014 -0400
x86: zero extend the result of 16-bit popcnt instructions (#9435)
Summary:
The 'popcnt r16, r/m16' instruction only writes the low 16 bits of
the destination register, so we have to zero-extend the result to
a full word as popCnt16# is supposed to return a Word#.
For popCnt8# we could instead zero-extend the input to 32 bits
and then do a 32-bit popcnt, and not have to zero-extend the result.
LLVM produces the 16-bit popcnt sequence with two zero extensions,
though, and who am I to argue?
Test Plan:
- ran "make TEST=cgrun071 EXTRA_HC_OPTS=-msse42"
- then ran again adding "WAY=optasm", and verified that
the popcnt sequences we generate match the ones produced
by LLVM for its @llvm.ctpop.* intrinsics
Reviewers: austin, hvr, tibbe
Reviewed By: austin, hvr, tibbe
Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter
Differential Revision: https://phabricator.haskell.org/D147
GHC Trac Issues: #9435
>---------------------------------------------------------------
64151913f1ed32ecfe17fcc40f7adc6cbfbb0bc1
compiler/nativeGen/X86/CodeGen.hs | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/compiler/nativeGen/X86/CodeGen.hs b/compiler/nativeGen/X86/CodeGen.hs
index d6fdee1..ce7120e 100644
--- a/compiler/nativeGen/X86/CodeGen.hs
+++ b/compiler/nativeGen/X86/CodeGen.hs
@@ -1743,15 +1743,19 @@ genCCall dflags is32Bit (PrimTarget (MO_PopCnt width)) dest_regs@[dst]
if sse4_2
then do code_src <- getAnyReg src
src_r <- getNewRegNat size
+ let dst_r = getRegisterReg platform False (CmmLocal dst)
return $ code_src src_r `appOL`
(if width == W8 then
-- The POPCNT instruction doesn't take a r/m8
unitOL (MOVZxL II8 (OpReg src_r) (OpReg src_r)) `appOL`
- unitOL (POPCNT II16 (OpReg src_r)
- (getRegisterReg platform False (CmmLocal dst)))
+ unitOL (POPCNT II16 (OpReg src_r) dst_r)
else
- unitOL (POPCNT size (OpReg src_r)
- (getRegisterReg platform False (CmmLocal dst))))
+ unitOL (POPCNT size (OpReg src_r) dst_r)) `appOL`
+ (if width == W8 || width == W16 then
+ -- We used a 16-bit destination register above,
+ -- so zero-extend
+ unitOL (MOVZxL II16 (OpReg dst_r) (OpReg dst_r))
+ else nilOL)
else do
targetExpr <- cmmMakeDynamicReference dflags
CallReference lbl
More information about the ghc-commits
mailing list