[GHC] #9246: GHC generates poor code for repeated uses of min/max
GHC
ghc-devs at haskell.org
Sun Jun 29 15:57:46 UTC 2014
#9246: GHC generates poor code for repeated uses of min/max
--------------------------------------------+------------------------------
Reporter: arotenberg | Owner:
Type: feature request | Status: new
Priority: normal | Milestone: 7.10.1
Component: Compiler | Version: 7.8.2
Resolution: | Keywords:
Operating System: Windows | Architecture: x86_64
Type of failure: Runtime performance bug | (amd64)
Test Case: | Difficulty: Unknown
Blocking: | Blocked By:
| Related Tickets: #6135
--------------------------------------------+------------------------------
Comment (by arotenberg):
I tried using criterion to profile different versions of this function,
but I couldn't get the results to not be noisy and "microbenchmarky".
The way I originally found this issue was by profiling the raytracer
program I'm writing, which fingered this function as both the single
largest time sink and the largest allocator. I tried replacing the
definition with this variant:
{{{
testRayAABBIntersection :: Ray -> AABB -> Bool
testRayAABBIntersection
(Ray (Vec3 (D# ox) (D# oy) (D# oz)) _
(Vec3 (D# invDx) (D# invDy) (D# invDz)))
(AABB (Vec3 (D# minX) (D# minY) (D# minZ))
(Vec3 (D# maxX) (D# maxY) (D# maxZ))) =
let tx1 = minusTimes## minX ox invDx
tx2 = minusTimes## maxX ox invDx
ty1 = minusTimes## minY oy invDy
ty2 = minusTimes## maxY oy invDy
tz1 = minusTimes## minZ oz invDz
tz2 = minusTimes## maxZ oz invDz
tmin = min## tx1 tx2 `max##` min## ty1 ty2 `max##` min## tz1 tz2
tmax = max## tx1 tx2 `min##` max## ty1 ty2 `min##` max## tz1 tz2
in isTrue# (tmax >=## max## 0.0## tmin)
{-# NOINLINE minusTimes## #-}
minusTimes## :: Double# -> Double# -> Double# -> Double#
minusTimes## minA oa invDa = (minA -## oa) *## invDa
{-# NOINLINE min## #-}
{-# NOINLINE max## #-}
min##, max## :: Double# -> Double# -> Double#
max## x y = if isTrue# (x <=## y) then y else x
min## x y = if isTrue# (x <=## y) then x else y
}}}
This gets the function's heap allocation down to the expected zero bytes,
but at the cost of actually making the program slower (both with and
without profiling)!
I haven't tried comparing results with the LLVM backend yet. I might look
into it later today.
It would be nice to see non-branching min/max implemented for the
primitive numeric types where available. Min/max for integer types can
probably be implemented efficiently on many platforms using conditional
move instructions. I'm not particularly interested in implementing it
myself, but I'm happy with having it as a feature request.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9246#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list