Linear types performance characterisation

Wed Jun 17 18:10:50 UTC 2020

Hi everyone,

Last week I discussed the plan for merging the LinearTypes branch into
GHC 8.12 with Arnaud, Richard, Andreas, and Simon. Many thanks to all of
them for their respective roles in pushing this patch over the finish
line.

One thing that we wanted to examine prior to merge is compiler
performance across a larger collection of packages. For this I used
the head.hackage patch-set, comparing the Linear Types branch with its
corresponding base commit in `master`. Here I will describe the
methodology used for this comparison and briefly summarize the (happily,
quite positive) results.

# Methodology

I collected total bytes allocated (as reported by the runtime system),
elapsed runtime (as reported by the runtime system), and instructions
(as reported by `perf stat`) of head.hackage builds in two
configurations:

 * the `opt` configuration
 * the `noopt` configuration, which passed `--disable-optimisation` to cabal-install

These configurations were evaluated on two commits:

 * `master`: 2b792facab46f7cdd09d12e79499f4e0dcd4293f
 * `linear-bang`: 481cf412d6e619c0e47960f4c70fb21f19d6996d

Unfortunately, the `noopt` configuration appears to be be affected by a
few cabal-install bugs [1,2] and consequently some packages may *still*
be compiled with optimisation, so take these numbers with a grain of
salt.

The test environment was a reasonably quiet Ryzen 7 1800X with 32 GBytes
of RAM.

The test was run by first building the two tested commits in Hadrian's
default build flavour. The head.hackage CI driver was then invoked as
follows:

    # Don't parallelize for stable performance measurements
    export CPUS=1
    export USE_NIX=1
    export EXTRA_HC_OPTS=-ddump-timings
    export COLLECT_PERF_STATS=1

    mkdir -p runs

    # master
    export GHC=/home/ben/ghc/ghc-compare-2/_build/stage1/bin/ghc
    ./run-ci --cabal-option=--disable-optimisation
    mv ci/run runs/master-noopt
    ./run-ci
    mv ci/run runs/master-opt

    # linear-bang
    export GHC=/home/ben/ghc/ghc-compare-1/_build/stage1/bin/ghc
    ./run-ci --cabal-option=--disable-optimisation
    mv ci/run runs/linear-noopt
    ./run-ci
    mv ci/run runs/linear-opt

As we are building all packages (nearly 300 in total) serially, the full
run takes quite a while (around 8 hours IIRC).

The final run of this test used head.hackage commit
e7e5c5cfbfd42c41b1e62d42bb18483a83b78701 (on the `rts-stats` branch).

# Results

I examined several different metrics of compiler performance

 * the total_wall_seconds RTS metric gives an picture of overall
   compilation effort

 * time reported by -ddump-timings, summed by module, gives a slightly
   finer-grained measurement of per-module compilation time

 * the RTS's bytes_allocated metric gives overall compiler allocations

 * the RTS"s max_bytes_used metric gives a sense of AST size (and
   potentially the existence of leaks)

To cut straight to the chase, the measurements show the following:

  metric                      -O0                -O1
  -------------------         ---------          ----------
  total_wall_seconds          +0.3%              +0.6%
  total_cpu_seconds           +0.3%              +0.7%
  max_bytes_used              +4.2%              +4.8%
  GC_cpu_seconds              +1.5%              +2.1%
  mut_cpu_seconds             no change          no change
  sum(per-module-time)        +4.2%              +4.2%
  sum(per-module-alloc)       +0.8%              +0.8%

There are a few things to point out here: the overall change in compiler
runtime is thankfully quite reasonable. However, max_bytes_used
increases rather considerably. This seems to give rise to an appreciable
regression in GC time. It would be interesting to know whether this can
be improved with optimisation to data representation. 

The fact that the cumulative per-module metrics didn't change between
-O0 and -O1 indicate
to me that there is a methodological problem which needs to be addressed
in the test infrastructure. I investigated this a bit and have a
hypothesis for what might be going on here; nevertheless, in the
interest of publishing these measurements I'm ignoring these
measurements for the time being.

I have attached the Jupyter notebook that gave rise to these numbers.
This gives a finer-grained breakdown of the data including histograms
showing the variance of each metric. Perhaps this will be helpful in
better understanding the effects. I would be happy to share my run data
as well although it is a bit large.

All-in-all, the Tweag folks have done a great job in squashing the
performance numbers noticed a few weeks ago. The current numbers look quite
acceptable for GHC 8.12. Congratulations to Arnaud, Krzysztof, and
Richard on landing this feature! I'm very much looking forward to see
what the community does with it in the coming years.

Cheers,

- Ben

[1] https://github.com/haskell/cabal/issues/5353
[2] https://github.com/haskell/cabal/issues/3883

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20200617/34860874/attachment-0001.sig>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: linear-types-analysis.html.gz
Type: application/octet-stream
Size: 277382 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20200617/34860874/attachment-0001.obj>