[GHC] #9476: Implement late lambda-lifting
GHC
ghc-devs at haskell.org
Thu Nov 8 17:34:48 UTC 2018
#9476: Implement late lambda-lifting
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner: sgraf
Type: feature request | Status: patch
Priority: normal | Milestone: 8.8.1
Component: Compiler | Version: 7.8.2
Resolution: | Keywords: LateLamLift
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #8763 #13286 | Differential Rev(s): Phab:D5224
Wiki Page: LateLamLift |
-------------------------------------+-------------------------------------
Comment (by sgraf):
I'm currently trying to find the right configuration for Runtime
benchmarking.
When using the NCG on the architecture I benchmark on, there are seemingly
random outliers performance-wise, even when ignoring benchmarks with less
than 200ms running time. Take `CSD` from `real/eff` for example. On the
target architecture (i7-6700), things consistently are 4.5% slower, yet
''there isn't a single lifted function in that benchmark''. It's basically
just a counting loop. To make matters worse, I can't reproduce this on my
local PC, quite the contrary there. Altogether this makes for a very
meager improvement of -0.2% in runtime.
This leads me to believe that the (relatively minor) benefits are obscured
by code size and layout concerns. If I only include benchmarks that ran at
least 500ms, things look much better (-0.4%), but that's probably because
I excluded the `eff` 'microbenchmarks'.
I tried another configuration that probably does better justice to the
optimisation: I re-ran the benchmarks with `-fllvm -optlo -Os` to have the
LLVM optimise for size concerns which IME yields less code layout
dependent results.
Anyway, ignoring benchmarks with <200ms runtime yields an improvement of
-1.0% (result:
https://ghc.haskell.org/trac/ghc/attachment/ticket/9476/nofib.txt), while
ignoring all benchmarks with <500ms runtime yields an -1.2% improvement.
Ironically, runtime of `CSD` ''improved'' by -7.1%.
Notable is also that while `n-body` allocates 20% less (heap space!), it
got slower by a non-meaningful margin of 0.1%. Maybe watching out for
allocations isn't the be all end all here.
I really think we should flag benchmarks for being eligible for runtime
measurements. I get hung up on what are architectural wibbles ''all the
time''.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9476#comment:52>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list