[GHC] #13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an inner loop over an array
GHC
ghc-devs at haskell.org
Sun May 28 17:59:34 UTC 2017
#13763: Performance regression (~34%) in 8.2.1, poor register allocation(?) in an
inner loop over an array
-------------------------------------+-------------------------------------
Reporter: jberryman | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler (NCG) | Version: 8.2.1-rc2
Resolution: | Keywords: JoinPoints
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by jberryman):
Yeah, I quickly gave up trying to core or downstream output between the
two. And I'm aware that we should expect register allocation to be fiddly:
the algorithm is simple and suboptimal, so it's likely that from very
different core we'd get randomly better or worse generated code,
especially if we're pushing our use of registers (I'm not sure if this is
really the case here).
Still I thought I should report because:
- this represents a real regression in my code, and perhaps something like
this should be included in ghc benchmarks to motivate improvements to
register allocation, and
- it might be that something else is going on and we're getting register
spilling when we really shouldn't
In any case I hope reporting this is not unproductive.
Here's a bit more information:
I found that the following change (which is more sensible source in any
case) results in significantly faster code in both cases and starts to
narrow but does not eliminate the regression (we are still using stack in
the 8.2 version). A test case with this version is probably better to work
with:
{{{#!hs
clean4xWord16ChunkLE :: Word64 -> Word64
{-# INLINE clean4xWord16ChunkLE #-}
clean4xWord16ChunkLE w64Dirty =
-- This improves things significantly (and is an improvement in
8.0.1), but
-- still regresses:
-- For "ByteArray 1000000":
-- 8.0.1: 2.663 ms - 2.261 ms
-- 8.2 rc: 3.371 ms - 2.728 ms
let !w64 = byteSwap64 w64Dirty
in
((w64 `unsafeShiftR` 8) .&. 0x00FF00FF00FF00FF)
.|.
((w64 `unsafeShiftL` 8) .&. 0xFF00FF00FF00FF00)
}}}
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/13763#comment:4>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list