[GHC] #9246: GHC generates poor code for repeated uses of min/max
GHC
ghc-devs at haskell.org
Sat Jun 28 23:48:14 UTC 2014
#9246: GHC generates poor code for repeated uses of min/max
------------------------------+--------------------------------------------
Reporter: arotenberg | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.8.2
Keywords: | Operating System: Windows
Architecture: x86_64 | Type of failure: Runtime performance bug
(amd64) | Test Case:
Difficulty: Unknown | Blocking:
Blocked By: |
Related Tickets: #6135 |
------------------------------+--------------------------------------------
Consider the following module, which intends to implement a
[http://tavianator.com/2011/05/fast-branchless-raybounding-box-
intersections/ branchless ray-AABB intersection test]:
{{{
module SimpleGeom where
data Vec3 = Vec3
{-# UNPACK #-} !Double
{-# UNPACK #-} !Double
{-# UNPACK #-} !Double
data Ray = Ray !Vec3 !Vec3 !Vec3
data AABB = AABB !Vec3 !Vec3
testRayAABBIntersection :: Ray -> AABB -> Bool
testRayAABBIntersection (Ray (Vec3 ox oy oz) _ (Vec3 invDx invDy invDz))
(AABB (Vec3 minX minY minZ) (Vec3 maxX maxY maxZ)) =
let tx1 = (minX - ox) * invDx
tx2 = (maxX - ox) * invDx
ty1 = (minY - oy) * invDy
ty2 = (maxY - oy) * invDy
tz1 = (minZ - oz) * invDz
tz2 = (maxZ - oz) * invDz
tmin = min tx1 tx2 `max` min ty1 ty2 `max` min tz1 tz2
tmax = max tx1 tx2 `min` max ty1 ty2 `min` max tz1 tz2
in tmax >= max 0 tmin
}}}
Everything is strict primitive operations, so GHC should generate very
simple, fast code, right? But upon compiling with {{{ghc -O -ddump-simpl
-ddump-to-file SimpleGeom}}}, I found a mess of nested local lambdas and
similar performance-killing expression forms. (See the attached output
file.)
There are two possible issues I can see here.
1. GHC is trying to expand out all of the branches recursively (I would
presume via case-of-cases transformation), which is a bad idea in this
instance compared to just performing the cases sequentially and storing
their results.
1. GHC is generating branches for floating-point min/max. Instruction sets
like SSE2 include non-branching floating-point min/max instructions, which
is exactly what this algorithm was designed to exploit, but GHC does not
generate code that could take advantage of these instructions.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9246>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list