[commit: ghc] master: x86: zero extend the result of 16-bit popcnt instructions (#9435) (6415191)

git at git.haskell.org git at git.haskell.org
Tue Aug 12 15:34:08 UTC 2014


Repository : ssh://git@git.haskell.org/ghc

On branch  : master
Link       : http://ghc.haskell.org/trac/ghc/changeset/64151913f1ed32ecfe17fcc40f7adc6cbfbb0bc1/ghc

>---------------------------------------------------------------

commit 64151913f1ed32ecfe17fcc40f7adc6cbfbb0bc1
Author: Reid Barton <rwbarton at gmail.com>
Date:   Tue Aug 12 11:11:46 2014 -0400

    x86: zero extend the result of 16-bit popcnt instructions (#9435)
    
    Summary:
    The 'popcnt r16, r/m16' instruction only writes the low 16 bits of
    the destination register, so we have to zero-extend the result to
    a full word as popCnt16# is supposed to return a Word#.
    
    For popCnt8# we could instead zero-extend the input to 32 bits
    and then do a 32-bit popcnt, and not have to zero-extend the result.
    LLVM produces the 16-bit popcnt sequence with two zero extensions,
    though, and who am I to argue?
    
    Test Plan:
     - ran "make TEST=cgrun071 EXTRA_HC_OPTS=-msse42"
     - then ran again adding "WAY=optasm", and verified that
       the popcnt sequences we generate match the ones produced
       by LLVM for its @llvm.ctpop.* intrinsics
    
    Reviewers: austin, hvr, tibbe
    
    Reviewed By: austin, hvr, tibbe
    
    Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter
    
    Differential Revision: https://phabricator.haskell.org/D147
    
    GHC Trac Issues: #9435


>---------------------------------------------------------------

64151913f1ed32ecfe17fcc40f7adc6cbfbb0bc1
 compiler/nativeGen/X86/CodeGen.hs | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/compiler/nativeGen/X86/CodeGen.hs b/compiler/nativeGen/X86/CodeGen.hs
index d6fdee1..ce7120e 100644
--- a/compiler/nativeGen/X86/CodeGen.hs
+++ b/compiler/nativeGen/X86/CodeGen.hs
@@ -1743,15 +1743,19 @@ genCCall dflags is32Bit (PrimTarget (MO_PopCnt width)) dest_regs@[dst]
     if sse4_2
         then do code_src <- getAnyReg src
                 src_r <- getNewRegNat size
+                let dst_r = getRegisterReg platform False (CmmLocal dst)
                 return $ code_src src_r `appOL`
                     (if width == W8 then
                          -- The POPCNT instruction doesn't take a r/m8
                          unitOL (MOVZxL II8 (OpReg src_r) (OpReg src_r)) `appOL`
-                         unitOL (POPCNT II16 (OpReg src_r)
-                                 (getRegisterReg platform False (CmmLocal dst)))
+                         unitOL (POPCNT II16 (OpReg src_r) dst_r)
                      else
-                         unitOL (POPCNT size (OpReg src_r)
-                                 (getRegisterReg platform False (CmmLocal dst))))
+                         unitOL (POPCNT size (OpReg src_r) dst_r)) `appOL`
+                    (if width == W8 || width == W16 then
+                         -- We used a 16-bit destination register above,
+                         -- so zero-extend
+                         unitOL (MOVZxL II16 (OpReg dst_r) (OpReg dst_r))
+                     else nilOL)
         else do
             targetExpr <- cmmMakeDynamicReference dflags
                           CallReference lbl



More information about the ghc-commits mailing list