[GHC] #12798: LLVM seeming to over optimize, producing inefficient assembly code...
GHC
ghc-devs at haskell.org
Sun Nov 6 16:53:28 UTC 2016
#12798: LLVM seeming to over optimize, producing inefficient assembly code...
-------------------------------------+-------------------------------------
Reporter: GordonBGood | Owner:
Type: bug | Status: new
Priority: normal | Milestone: 8.2.1
Component: Compiler (LLVM) | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by AlexET):
On current HEAD with llvm 3.9.0, the follwing code
{{{
import Data.Word
import Data.Bits
import Data.Array.ST (runSTUArray)
import Data.Array.Base
import Control.Monad.ST
numLOOPS = 10000 :: Int
twos :: UArray Int Word32
twos = listArray (0, 31) [1 `shiftL` i | i <- [0 .. 31]]
soe :: () -> [Word32]
soe() = 2 : [fromIntegral i * 2 + 3 | (i, False) <- assocs bufb] where
bufb = runSTUArray $ do
let bfLmt = (256 * 1024) `div` 2 - 1 -- to 2^18 + 2 is 128 KBits - 1 =
16 KBytes
cmpstsb <- newArray (0, bfLmt) False :: ST s (STUArray s Int Bool)
cmpstsw <- (castSTUArray :: STUArray s Int Bool -> ST s (STUArray s Int
Word32)) cmpstsb
return $! twos -- force evaluation of twos outside the loop.
let loop n = -- cull a number of times to test timing
if n <= 0 then return cmpstsb else
let cullp i =
let p = i + i + 3 in
let s = (p * p - 3) `div` 2 in
if s > bfLmt then loop (n - 1) else do
isCmpst <- unsafeRead cmpstsb i
if isCmpst then cullp (i + 1) else -- is Prime
let cull j = -- very tight inner loop where all the time
is spent
if j > bfLmt then cullp (i + 1) else do
let sh = unsafeAt twos (j .&. 31) -- (1 `shiftL`
(j .&. 31)))
let w = j `shiftR` 5
ov <- unsafeRead cmpstsw w
unsafeWrite cmpstsw w (ov .|. sh)
cull (j + p) in cull s in cullp 0
loop numLOOPS
main = print $ length $ soe()
}}}
Gives the inner loop, which is almost the same as rust.
{{{
.LBB28_7:
movq %rcx, %rdx
sarq $5, %rdx
movl %ecx, %edi
andl $31, %edi
movl 16(%r9,%rdi,4), %edi
orl %edi, 16(%rax,%rdx,4)
addq %rsi, %rcx
.LBB28_5:
cmpq $131071, %rcx
jle .LBB28_7
}}}
The CMM and initial llvm code is the same as your code for 8.0.1, so it
seems the difference is due to the fact that rust ships with its own more
recent llvm than ghc 8.0.1 supports.
The difference in the code between my version and your original version is
that we force `twos` early which is needed to prevent the evaluation of
that within the loop, an optimisation which seems to have been missed by
HEAD.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12798#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list