[GHC] #9435: x86 sse4.2 popCnt16# needs to zero-extend its result
GHC
ghc-devs at haskell.org
Tue Aug 12 15:34:04 UTC 2014
#9435: x86 sse4.2 popCnt16# needs to zero-extend its result
-------------------------------------+-------------------------------------
Reporter: rwbarton | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.9
(NCG) | Keywords:
Resolution: | Architecture: x86_64 (amd64)
Operating System: | Difficulty: Unknown
Unknown/Multiple | Blocked By:
Type of failure: Incorrect | Related Tickets:
result at runtime |
Test Case: |
Blocking: |
Differential Revisions: Phab:D147 |
-------------------------------------+-------------------------------------
Comment (by Reid Barton <rwbarton@…>):
In [changeset:"64151913f1ed32ecfe17fcc40f7adc6cbfbb0bc1/ghc"]:
{{{
#!CommitTicketReference repository="ghc"
revision="64151913f1ed32ecfe17fcc40f7adc6cbfbb0bc1"
x86: zero extend the result of 16-bit popcnt instructions (#9435)
Summary:
The 'popcnt r16, r/m16' instruction only writes the low 16 bits of
the destination register, so we have to zero-extend the result to
a full word as popCnt16# is supposed to return a Word#.
For popCnt8# we could instead zero-extend the input to 32 bits
and then do a 32-bit popcnt, and not have to zero-extend the result.
LLVM produces the 16-bit popcnt sequence with two zero extensions,
though, and who am I to argue?
Test Plan:
- ran "make TEST=cgrun071 EXTRA_HC_OPTS=-msse42"
- then ran again adding "WAY=optasm", and verified that
the popcnt sequences we generate match the ones produced
by LLVM for its @llvm.ctpop.* intrinsics
Reviewers: austin, hvr, tibbe
Reviewed By: austin, hvr, tibbe
Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter
Differential Revision: https://phabricator.haskell.org/D147
GHC Trac Issues: #9435
}}}
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9435#comment:3>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list