[GHC] #14980: Runtime performance regression with binary operations on vectors

GHC ghc-devs at haskell.org
Mon Jun 25 05:29:32 UTC 2018


#14980: Runtime performance regression with binary operations on vectors
-------------------------------------+-------------------------------------
        Reporter:  ttylec            |                Owner:  bgamari
            Type:  bug               |               Status:  new
        Priority:  high              |            Milestone:  8.8.1
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:  vector
                                     |  bitwise operations
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by tdammers):

 Replying to [comment:7 ttylec]:
 > Hm... not exactly the bug disappear, you observed no speedup with
 "binary packed" version in the first place. Notice that in my benchmark,
 the "binary packed" version is an order of magnitude faster that the
 "unbox vectors" and the bug is about loosing that speed-up when we compile
 with 8.2.2 (and later)

 Indeed, I noticed.

 > In your case, there was no speed-up in the first place. May I ask you to
 check also `stack exec performance-bug-pair-2` and `stack exec
 performance-bug-2`?

 Stack failed on a cassava thing, I'm not exactly sure how to fix it, so I
 tried cabal first. If stack vs. cabal is the problem, however, then most
 likely that means it's a library problem rather than a GHC bug. I'll see
 if I can sort it out though, so that maybe I can reproduce your results.
 However, I don't think optimization settings could be the problem - after
 all, these are specified in the .cabal file, not stack.yaml, I have not
 overridden anything, and I verified that -O2 actually gets passed to GHC.
 I have even compiled the programs manually, merely pointing GHC to the
 cabal sandbox's package cache, and got the same results regardless.

 Results for `performance-bug-2` are similar.

 8.0.2:

 {{{
 "Generated"
 benchmarking 64 columns/raw unbox vectors
 time                 445.2 μs   (445.1 μs .. 445.4 μs)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 443.4 μs   (442.9 μs .. 443.8 μs)
 std dev              1.366 μs   (1.077 μs .. 1.655 μs)

 benchmarking 64 columns/binary packed
 time                 51.16 μs   (51.09 μs .. 51.24 μs)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 50.95 μs   (50.90 μs .. 51.01 μs)
 std dev              204.4 ns   (159.7 ns .. 264.0 ns)

 benchmarking 256 columns/raw unbox vectors
 time                 443.9 μs   (443.6 μs .. 444.4 μs)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 442.7 μs   (442.1 μs .. 444.1 μs)
 std dev              2.711 μs   (1.414 μs .. 5.048 μs)

 benchmarking 256 columns/binary packed
 time                 260.4 μs   (255.3 μs .. 266.2 μs)
                      0.997 R²   (0.996 R² .. 0.998 R²)
 mean                 266.6 μs   (263.1 μs .. 271.5 μs)
 std dev              9.366 μs   (6.649 μs .. 13.29 μs)
 variance introduced by outliers: 24% (moderately inflated)
 }}}

 8.2.2:
 {{{
 "Generated"
 benchmarking 64 columns/raw unbox vectors
 time                 445.0 μs   (444.7 μs .. 445.2 μs)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 444.0 μs   (443.2 μs .. 447.0 μs)
 std dev              4.654 μs   (1.118 μs .. 9.693 μs)

 benchmarking 64 columns/binary packed
 time                 51.13 μs   (51.11 μs .. 51.15 μs)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 50.90 μs   (50.86 μs .. 50.94 μs)
 std dev              146.1 ns   (122.9 ns .. 181.2 ns)

 benchmarking 256 columns/raw unbox vectors
 time                 440.4 μs   (440.1 μs .. 440.5 μs)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 437.9 μs   (437.3 μs .. 438.4 μs)
 std dev              1.797 μs   (1.576 μs .. 2.095 μs)

 benchmarking 256 columns/binary packed
 time                 289.5 μs   (285.1 μs .. 294.3 μs)
                      0.998 R²   (0.998 R² .. 0.999 R²)
 mean                 295.6 μs   (292.0 μs .. 299.8 μs)
 std dev              8.814 μs   (6.656 μs .. 11.69 μs)
 variance introduced by outliers: 19% (moderately inflated)
 }}}

 > I am curious on what machine/system you did tested it? Oh, and obviously
 optimization must be enabled (in case you didn't `stack build` it).

 Debian 9, x86_64. Intel i5 CPU, 4 GB RAM, official GHC release builds.

 I have a few possible explanations as to why we're seeing these
 differences:

 - Stack may be pulling in other GHC versions than the release bundles
 - Stack may be pulling in different version of some crucial library
 - Whatever platform you run on might trigger different code paths in GHC

 I'll investigate further.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14980#comment:8>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list