[commit: ghc] ghc-7.8: x86: zero extend the result of 16-bit popcnt instructions (#9435) (7766e5e)

git at git.haskell.org git at git.haskell.org
Mon Oct 27 16:05:00 UTC 2014


Repository : ssh://git@git.haskell.org/ghc

On branch  : ghc-7.8
Link       : http://ghc.haskell.org/trac/ghc/changeset/7766e5e9fd4889eecc8a1f7fe1982981c8694b58/ghc

>---------------------------------------------------------------

commit 7766e5e9fd4889eecc8a1f7fe1982981c8694b58
Author: Reid Barton <rwbarton at gmail.com>
Date:   Tue Aug 12 11:11:46 2014 -0400

    x86: zero extend the result of 16-bit popcnt instructions (#9435)
    
    Summary:
    The 'popcnt r16, r/m16' instruction only writes the low 16 bits of
    the destination register, so we have to zero-extend the result to
    a full word as popCnt16# is supposed to return a Word#.
    
    For popCnt8# we could instead zero-extend the input to 32 bits
    and then do a 32-bit popcnt, and not have to zero-extend the result.
    LLVM produces the 16-bit popcnt sequence with two zero extensions,
    though, and who am I to argue?
    
    Test Plan:
     - ran "make TEST=cgrun071 EXTRA_HC_OPTS=-msse42"
     - then ran again adding "WAY=optasm", and verified that
       the popcnt sequences we generate match the ones produced
       by LLVM for its @llvm.ctpop.* intrinsics
    
    Reviewers: austin, hvr, tibbe
    
    Reviewed By: austin, hvr, tibbe
    
    Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter
    
    Differential Revision: https://phabricator.haskell.org/D147
    
    GHC Trac Issues: #9435
    
    (cherry picked from commit 64151913f1ed32ecfe17fcc40f7adc6cbfbb0bc1)


>---------------------------------------------------------------

7766e5e9fd4889eecc8a1f7fe1982981c8694b58
 compiler/nativeGen/X86/CodeGen.hs | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/compiler/nativeGen/X86/CodeGen.hs b/compiler/nativeGen/X86/CodeGen.hs
index 2456688..8b7d0df 100644
--- a/compiler/nativeGen/X86/CodeGen.hs
+++ b/compiler/nativeGen/X86/CodeGen.hs
@@ -1710,15 +1710,19 @@ genCCall is32Bit (PrimTarget (MO_PopCnt width)) dest_regs@[dst]
     if sse4_2
         then do code_src <- getAnyReg src
                 src_r <- getNewRegNat size
+                let dst_r = getRegisterReg platform False (CmmLocal dst)
                 return $ code_src src_r `appOL`
                     (if width == W8 then
                          -- The POPCNT instruction doesn't take a r/m8
                          unitOL (MOVZxL II8 (OpReg src_r) (OpReg src_r)) `appOL`
-                         unitOL (POPCNT II16 (OpReg src_r)
-                                 (getRegisterReg platform False (CmmLocal dst)))
+                         unitOL (POPCNT II16 (OpReg src_r) dst_r)
                      else
-                         unitOL (POPCNT size (OpReg src_r)
-                                 (getRegisterReg platform False (CmmLocal dst))))
+                         unitOL (POPCNT size (OpReg src_r) dst_r)) `appOL`
+                    (if width == W8 || width == W16 then
+                         -- We used a 16-bit destination register above,
+                         -- so zero-extend
+                         unitOL (MOVZxL II16 (OpReg dst_r) (OpReg dst_r))
+                     else nilOL)
         else do
             targetExpr <- cmmMakeDynamicReference dflags
                           CallReference lbl



More information about the ghc-commits mailing list