<div dir="auto">Great work! I'm very excited to see these perf issues squashed.<div dir="auto"><br></div><div dir="auto">Thanks to everyone working on this, and also to Ben for such thorough benchmarking work!</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jun 17, 2020, 11:11 AM Ben Gamari <<a href="mailto:ben@well-typed.com">ben@well-typed.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi everyone,<br>

<br>

Last week I discussed the plan for merging the LinearTypes branch into<br>

GHC 8.12 with Arnaud, Richard, Andreas, and Simon. Many thanks to all of<br>

them for their respective roles in pushing this patch over the finish<br>

line.<br>

<br>

One thing that we wanted to examine prior to merge is compiler<br>

performance across a larger collection of packages. For this I used<br>

the head.hackage patch-set, comparing the Linear Types branch with its<br>

corresponding base commit in `master`. Here I will describe the<br>

methodology used for this comparison and briefly summarize the (happily,<br>

quite positive) results.<br>

<br>

<br>

# Methodology<br>

<br>

I collected total bytes allocated (as reported by the runtime system),<br>

elapsed runtime (as reported by the runtime system), and instructions<br>

(as reported by `perf stat`) of head.hackage builds in two<br>

configurations:<br>

<br>

 * the `opt` configuration<br>

 * the `noopt` configuration, which passed `--disable-optimisation` to cabal-install<br>

<br>

These configurations were evaluated on two commits:<br>

<br>

 * `master`: 2b792facab46f7cdd09d12e79499f4e0dcd4293f<br>

 * `linear-bang`: 481cf412d6e619c0e47960f4c70fb21f19d6996d<br>

<br>

Unfortunately, the `noopt` configuration appears to be be affected by a<br>

few cabal-install bugs [1,2] and consequently some packages may *still*<br>

be compiled with optimisation, so take these numbers with a grain of<br>

salt.<br>

<br>

The test environment was a reasonably quiet Ryzen 7 1800X with 32 GBytes<br>

of RAM.<br>

<br>

The test was run by first building the two tested commits in Hadrian's<br>

default build flavour. The head.hackage CI driver was then invoked as<br>

follows:<br>

<br>

    # Don't parallelize for stable performance measurements<br>

    export CPUS=1<br>

    export USE_NIX=1<br>

    export EXTRA_HC_OPTS=-ddump-timings<br>

    export COLLECT_PERF_STATS=1<br>

<br>

    mkdir -p runs<br>

<br>

    # master<br>

    export GHC=/home/ben/ghc/ghc-compare-2/_build/stage1/bin/ghc<br>

    ./run-ci --cabal-option=--disable-optimisation<br>

    mv ci/run runs/master-noopt<br>

    ./run-ci<br>

    mv ci/run runs/master-opt<br>

<br>

    # linear-bang<br>

    export GHC=/home/ben/ghc/ghc-compare-1/_build/stage1/bin/ghc<br>

    ./run-ci --cabal-option=--disable-optimisation<br>

    mv ci/run runs/linear-noopt<br>

    ./run-ci<br>

    mv ci/run runs/linear-opt<br>

<br>

As we are building all packages (nearly 300 in total) serially, the full<br>

run takes quite a while (around 8 hours IIRC).<br>

<br>

The final run of this test used head.hackage commit<br>

e7e5c5cfbfd42c41b1e62d42bb18483a83b78701 (on the `rts-stats` branch).<br>

<br>

<br>

# Results<br>

<br>

I examined several different metrics of compiler performance<br>

<br>

 * the total_wall_seconds RTS metric gives an picture of overall<br>

   compilation effort<br>

<br>

 * time reported by -ddump-timings, summed by module, gives a slightly<br>

   finer-grained measurement of per-module compilation time<br>

<br>

 * the RTS's bytes_allocated metric gives overall compiler allocations<br>

<br>

 * the RTS"s max_bytes_used metric gives a sense of AST size (and<br>

   potentially the existence of leaks)<br>

<br>

To cut straight to the chase, the measurements show the following:<br>

<br>

  metric                      -O0                -O1<br>

  -------------------         ---------          ----------<br>

  total_wall_seconds          +0.3%              +0.6%<br>

  total_cpu_seconds           +0.3%              +0.7%<br>

  max_bytes_used              +4.2%              +4.8%<br>

  GC_cpu_seconds              +1.5%              +2.1%<br>

  mut_cpu_seconds             no change          no change<br>

  sum(per-module-time)        +4.2%              +4.2%<br>

  sum(per-module-alloc)       +0.8%              +0.8%<br>

<br>

There are a few things to point out here: the overall change in compiler<br>

runtime is thankfully quite reasonable. However, max_bytes_used<br>

increases rather considerably. This seems to give rise to an appreciable<br>

regression in GC time. It would be interesting to know whether this can<br>

be improved with optimisation to data representation. <br>

<br>

The fact that the cumulative per-module metrics didn't change between<br>

-O0 and -O1 indicate<br>

to me that there is a methodological problem which needs to be addressed<br>

in the test infrastructure. I investigated this a bit and have a<br>

hypothesis for what might be going on here; nevertheless, in the<br>

interest of publishing these measurements I'm ignoring these<br>

measurements for the time being.<br>

<br>

I have attached the Jupyter notebook that gave rise to these numbers.<br>

This gives a finer-grained breakdown of the data including histograms<br>

showing the variance of each metric. Perhaps this will be helpful in<br>

better understanding the effects. I would be happy to share my run data<br>

as well although it is a bit large.<br>

<br>

All-in-all, the Tweag folks have done a great job in squashing the<br>

performance numbers noticed a few weeks ago. The current numbers look quite<br>

acceptable for GHC 8.12. Congratulations to Arnaud, Krzysztof, and<br>

Richard on landing this feature! I'm very much looking forward to see<br>

what the community does with it in the coming years.<br>

<br>

Cheers,<br>

<br>

- Ben<br>

<br>

<br>

[1] <a href="https://github.com/haskell/cabal/issues/5353" rel="noreferrer noreferrer" target="_blank">https://github.com/haskell/cabal/issues/5353</a><br>

[2] <a href="https://github.com/haskell/cabal/issues/3883" rel="noreferrer noreferrer" target="_blank">https://github.com/haskell/cabal/issues/3883</a><br>

<br>

_______________________________________________<br>

ghc-devs mailing list<br>

<a href="mailto:ghc-devs@haskell.org" target="_blank" rel="noreferrer">ghc-devs@haskell.org</a><br>

<a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs" rel="noreferrer noreferrer" target="_blank">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a><br>

</blockquote></div>