[GHC] #9340: Implement new `clz` inline primop

GHC ghc-devs at haskell.org
Sat Oct 18 12:34:36 UTC 2014


#9340: Implement new `clz` inline primop
-------------------------------------+-------------------------------------
              Reporter:  hvr         |            Owner:  hvr
                  Type:  feature     |           Status:  closed
  request                            |        Milestone:  7.10.1
              Priority:  normal      |          Version:  7.8.2
             Component:  Compiler    |         Keywords:  primop clz ctz
  (CodeGen)                          |     Architecture:  Unknown/Multiple
            Resolution:  fixed       |       Difficulty:  Unknown
      Operating System:              |       Blocked By:
  Unknown/Multiple                   |  Related Tickets:  #9532
       Type of failure:              |
  None/Unknown                       |
             Test Case:              |
              Blocking:              |
Differential Revisions:  Phab:D144   |
-------------------------------------+-------------------------------------

Comment (by Herbert Valerio Riedel <hvr@…>):

 In [changeset:"612f3d120c65a461a4ad7f212d67bdae005f4975/ghc"]:
 {{{
 #!CommitTicketReference repository="ghc"
 revision="612f3d120c65a461a4ad7f212d67bdae005f4975"
 Implement optimized NCG `MO_Ctz W64` op for i386 (#9340)

 Summary:
 This is an optimization to the CTZ primops introduced for #9340

 Previously we called out to `hs_ctz64`, but we can actually generate
 better hand-tuned code while avoiding the FFI ccall.

 With this patch, the code

   {-# LANGUAGE MagicHash #-}
   module TestClz0 where
   import GHC.Prim
   ctz64 :: Word64# -> Word#
   ctz64 x = ctz64# x

 results in the following assembler generated by NCG on i386:

   TestClz.ctz64_info:
       movl (%ebp),%eax
       movl 4(%ebp),%ecx
       movl %ecx,%edx
       orl %eax,%edx
       movl $64,%edx
       je _nAO

       bsf %ecx,%ecx
       addl $32,%ecx
       bsf %eax,%eax
       cmovne %eax,%ecx
       movl %ecx,%edx
   _nAO:
       movl %edx,%esi
       addl $8,%ebp
       jmp *(%ebp)

 For comparision, here's what LLVM 3.4 currently generates:

   000000fc <TestClzz_ctzz64_info>:
     fc:   0f bc 45 04             bsf    0x4(%ebp),%eax
    100:   b9 20 00 00 00          mov    $0x20,%ecx
    105:   0f 45 c8                cmovne %eax,%ecx
    108:   83 c1 20                add    $0x20,%ecx
    10b:   8b 45 00                mov    0x0(%ebp),%eax
    10e:   8b 55 08                mov    0x8(%ebp),%edx
    111:   0f bc f0                bsf    %eax,%esi
    114:   85 c0                   test   %eax,%eax
    116:   0f 44 f1                cmove  %ecx,%esi
    119:   83 c5 08                add    $0x8,%ebp
    11c:   ff e2                   jmp    *%edx

 Reviewed By: austin

 Auditors: simonmar

 Differential Revision: https://phabricator.haskell.org/D163
 }}}

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9340#comment:11>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list