[GHC] #13629: sqrt should use machine instruction on x86_64
GHC
ghc-devs at haskell.org
Fri Apr 28 20:18:40 UTC 2017
#13629: sqrt should use machine instruction on x86_64
-------------------------------------+-------------------------------------
Reporter: bgamari | Owner: (none)
Type: bug | Status: closed
Priority: normal | Milestone: 8.4.1
Component: Compiler (NCG) | Version: 8.0.1
Resolution: fixed | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: Runtime | Test Case:
performance bug | numeric/num009
Blocked By: | Blocking:
Related Tickets: #13570 | Differential Rev(s): Phab:D3508
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by kavon):
Replying to [comment:4 bgamari]:
> It's made in the native code generator, `genCCall` in
`compiler/nativeGen/X86/CodeGen.hs`. While we use the `fsin` instruction
on i386, we don't on x86_64 (and i386 with `-msse2`).
If there is already x87 FPU instruction support in the NCG for x86-32, it
might be profitable to reuse that support for x86-64 to speed up trig
functions, etc.
The simplest way I see it is to expand the foreign call into an
instruction sequence that moves the float from XMM registers to the x87
stack, computes the value, and moves it back to XMM registers. This way we
no longer have a C call in a potentially bad place.
It's worth comparing x87 on modern processors against the assembly routine
backing the C function first. It seems platforms like Skylake the x87
`fsin` takes 50-120 cycles [1], but I'm not sure about the library
versions. If they're roughly equivalent, there's likely a benefit to
eliding the C call.
[1] http://www.agner.org/optimize/instruction_tables.pdf (Page 223 for
Skylake x87)
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/13629#comment:9>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list