[GHC] #15185: Enum instance for IntX / WordX are inefficient
GHC
ghc-devs at haskell.org
Fri May 25 17:21:57 UTC 2018
#15185: Enum instance for IntX / WordX are inefficient
-------------------------------------+-------------------------------------
Reporter: guibou | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone: 8.6.1
Component: Core | Version: 8.4.2
Libraries |
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: None/Unknown
Unknown/Multiple |
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
The following factorial benchmark show high discrepancy in performance
between `Int` and `Int64` (and similarly between `Word` and `Word64`) when
using `product [1..n]` to compute factorial:
{{{#!haskell
{-# LANGUAGE TypeApplications #-}
import Criterion.Main
import Data.Int
import Data.Word
import Numeric.Natural
fact :: Integral t => t -> t
fact n = product [1..n]
fact2 :: Integral t => t -> t
fact2 n = go 1 n
where go acc 1 = acc
go acc n = go (acc * n) (n - 1)
main :: IO ()
main = defaultMain [
bgroup "fact 20" [
bench "Int" $ whnf (fact @Int) 20
, bench "Word" $ whnf (fact @Word) 20
, bench "Int64" $ whnf (fact @Int64) 20
, bench "Word64" $ whnf (fact @Word64) 20
, bench "Integer" $ whnf (fact @Integer) 20
, bench "Natural" $ whnf (fact @Natural) 20
]
,
bgroup "fact2 20" [
bench "Int" $ whnf (fact @Int) 20
, bench "Word" $ whnf (fact @Word) 20
, bench "Int64" $ whnf (fact @Int64) 20
, bench "Word64" $ whnf (fact @Word64) 20
, bench "Integer" $ whnf (fact @Integer) 20
, bench "Natural" $ whnf (fact @Natural) 20
]
]
}}}
{{{
benchmarking fact 20/Int
time 17.33 ns (17.22 ns .. 17.41 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 17.21 ns (17.16 ns .. 17.29 ns)
std dev 222.4 ps (184.5 ps .. 282.3 ps)
variance introduced by outliers: 15% (moderately inflated)
benchmarking fact 20/Word
time 17.25 ns (17.01 ns .. 17.63 ns)
0.999 R² (0.997 R² .. 1.000 R²)
mean 17.19 ns (17.11 ns .. 17.36 ns)
std dev 358.6 ps (199.4 ps .. 674.3 ps)
variance introduced by outliers: 32% (moderately inflated)
benchmarking fact 20/Int64
time 166.7 ns (165.4 ns .. 168.5 ns)
0.999 R² (0.997 R² .. 1.000 R²)
mean 166.8 ns (166.0 ns .. 168.5 ns)
std dev 4.077 ns (1.854 ns .. 6.592 ns)
variance introduced by outliers: 35% (moderately inflated)
benchmarking fact 20/Word64
time 560.5 ns (550.6 ns .. 575.0 ns)
0.995 R² (0.989 R² .. 1.000 R²)
mean 556.5 ns (549.1 ns .. 575.9 ns)
std dev 36.08 ns (9.195 ns .. 60.69 ns)
variance introduced by outliers: 78% (severely inflated)
benchmarking fact 20/Integer
time 421.9 ns (419.8 ns .. 425.3 ns)
1.000 R² (0.999 R² .. 1.000 R²)
mean 427.7 ns (425.5 ns .. 431.3 ns)
std dev 9.238 ns (6.311 ns .. 16.19 ns)
variance introduced by outliers: 28% (moderately inflated)
benchmarking fact 20/Natural
time 534.5 ns (532.6 ns .. 536.3 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 533.7 ns (532.6 ns .. 535.3 ns)
std dev 4.535 ns (3.652 ns .. 6.226 ns)
benchmarking fact2 20/Int
time 17.14 ns (17.07 ns .. 17.23 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 17.15 ns (17.10 ns .. 17.24 ns)
std dev 227.1 ps (167.4 ps .. 286.9 ps)
variance introduced by outliers: 16% (moderately inflated)
benchmarking fact2 20/Word
time 16.93 ns (16.86 ns .. 17.02 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 16.90 ns (16.86 ns .. 16.95 ns)
std dev 149.8 ps (107.4 ps .. 192.5 ps)
benchmarking fact2 20/Int64
time 165.1 ns (164.6 ns .. 165.7 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 164.8 ns (164.5 ns .. 165.1 ns)
std dev 1.033 ns (778.6 ps .. 1.363 ns)
benchmarking fact2 20/Word64
time 545.3 ns (542.8 ns .. 547.9 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 543.6 ns (542.4 ns .. 545.2 ns)
std dev 4.741 ns (3.736 ns .. 5.824 ns)
benchmarking fact2 20/Integer
time 419.1 ns (417.2 ns .. 421.1 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 418.2 ns (416.8 ns .. 420.2 ns)
std dev 5.415 ns (4.094 ns .. 7.147 ns)
variance introduced by outliers: 12% (moderately inflated)
benchmarking fact2 20/Natural
time 533.8 ns (532.2 ns .. 536.1 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 533.2 ns (532.1 ns .. 534.8 ns)
std dev 4.428 ns (3.217 ns .. 5.741 ns)
}}}
In `fact`, `Int` and `Word` are efficient, `Int64` and `Word64` are
respectively 10x and 20x slowers, when they should be identical on 64 bit
platform. `Word64` is even slower than `Integer`!
Replacing `fact` by a handwritten recursion, in `fact2` gives something
more coherent where all the 64 bits types have roughly the same
efficiency.
I observed in the source code that `Int` and `Int64` are not implemented
as type synonym.
Using `-ddump-rule-firings`, I observed that a rule `fold/build` fired and
removed the `fold` for the `Int` case, and not for the `Int64`, I don't
understand much ;(
This is not dramatic because most people will use `Int` instead of
`Int64`, but it is a surprising behavior. I also observed that smaller
integral (such as `Int8`, `Int16`, `Int32`) behaves as badly as `Int64`)
and this is much more important because there is no other default and more
efficient type synonym.
Note that I'll be interested to provide a patch myself if mentored a bit.
Especially, I'm thinking about merging the `Int` and `Int64`
implementation (with respect to the architecture default word size).
Tested on GHC 8.4.2
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15185>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list