Linear types performance characterisation
Ben Gamari
ben at well-typed.com
Wed Jun 17 18:10:50 UTC 2020
Hi everyone,
Last week I discussed the plan for merging the LinearTypes branch into
GHC 8.12 with Arnaud, Richard, Andreas, and Simon. Many thanks to all of
them for their respective roles in pushing this patch over the finish
line.
One thing that we wanted to examine prior to merge is compiler
performance across a larger collection of packages. For this I used
the head.hackage patch-set, comparing the Linear Types branch with its
corresponding base commit in `master`. Here I will describe the
methodology used for this comparison and briefly summarize the (happily,
quite positive) results.
# Methodology
I collected total bytes allocated (as reported by the runtime system),
elapsed runtime (as reported by the runtime system), and instructions
(as reported by `perf stat`) of head.hackage builds in two
configurations:
* the `opt` configuration
* the `noopt` configuration, which passed `--disable-optimisation` to cabal-install
These configurations were evaluated on two commits:
* `master`: 2b792facab46f7cdd09d12e79499f4e0dcd4293f
* `linear-bang`: 481cf412d6e619c0e47960f4c70fb21f19d6996d
Unfortunately, the `noopt` configuration appears to be be affected by a
few cabal-install bugs [1,2] and consequently some packages may *still*
be compiled with optimisation, so take these numbers with a grain of
salt.
The test environment was a reasonably quiet Ryzen 7 1800X with 32 GBytes
of RAM.
The test was run by first building the two tested commits in Hadrian's
default build flavour. The head.hackage CI driver was then invoked as
follows:
# Don't parallelize for stable performance measurements
export CPUS=1
export USE_NIX=1
export EXTRA_HC_OPTS=-ddump-timings
export COLLECT_PERF_STATS=1
mkdir -p runs
# master
export GHC=/home/ben/ghc/ghc-compare-2/_build/stage1/bin/ghc
./run-ci --cabal-option=--disable-optimisation
mv ci/run runs/master-noopt
./run-ci
mv ci/run runs/master-opt
# linear-bang
export GHC=/home/ben/ghc/ghc-compare-1/_build/stage1/bin/ghc
./run-ci --cabal-option=--disable-optimisation
mv ci/run runs/linear-noopt
./run-ci
mv ci/run runs/linear-opt
As we are building all packages (nearly 300 in total) serially, the full
run takes quite a while (around 8 hours IIRC).
The final run of this test used head.hackage commit
e7e5c5cfbfd42c41b1e62d42bb18483a83b78701 (on the `rts-stats` branch).
# Results
I examined several different metrics of compiler performance
* the total_wall_seconds RTS metric gives an picture of overall
compilation effort
* time reported by -ddump-timings, summed by module, gives a slightly
finer-grained measurement of per-module compilation time
* the RTS's bytes_allocated metric gives overall compiler allocations
* the RTS"s max_bytes_used metric gives a sense of AST size (and
potentially the existence of leaks)
To cut straight to the chase, the measurements show the following:
metric -O0 -O1
------------------- --------- ----------
total_wall_seconds +0.3% +0.6%
total_cpu_seconds +0.3% +0.7%
max_bytes_used +4.2% +4.8%
GC_cpu_seconds +1.5% +2.1%
mut_cpu_seconds no change no change
sum(per-module-time) +4.2% +4.2%
sum(per-module-alloc) +0.8% +0.8%
There are a few things to point out here: the overall change in compiler
runtime is thankfully quite reasonable. However, max_bytes_used
increases rather considerably. This seems to give rise to an appreciable
regression in GC time. It would be interesting to know whether this can
be improved with optimisation to data representation.
The fact that the cumulative per-module metrics didn't change between
-O0 and -O1 indicate
to me that there is a methodological problem which needs to be addressed
in the test infrastructure. I investigated this a bit and have a
hypothesis for what might be going on here; nevertheless, in the
interest of publishing these measurements I'm ignoring these
measurements for the time being.
I have attached the Jupyter notebook that gave rise to these numbers.
This gives a finer-grained breakdown of the data including histograms
showing the variance of each metric. Perhaps this will be helpful in
better understanding the effects. I would be happy to share my run data
as well although it is a bit large.
All-in-all, the Tweag folks have done a great job in squashing the
performance numbers noticed a few weeks ago. The current numbers look quite
acceptable for GHC 8.12. Congratulations to Arnaud, Krzysztof, and
Richard on landing this feature! I'm very much looking forward to see
what the community does with it in the coming years.
Cheers,
- Ben
[1] https://github.com/haskell/cabal/issues/5353
[2] https://github.com/haskell/cabal/issues/3883
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20200617/34860874/attachment-0001.sig>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: linear-types-analysis.html.gz
Type: application/octet-stream
Size: 277382 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20200617/34860874/attachment-0001.obj>
More information about the ghc-devs
mailing list