Linear types performance characterisation

Wed Jun 17 18:18:01 UTC 2020

Great work! I'm very excited to see these perf issues squashed.

Thanks to everyone working on this, and also to Ben for such thorough
benchmarking work!

On Wed, Jun 17, 2020, 11:11 AM Ben Gamari <ben at well-typed.com> wrote:

> Hi everyone,
>
> Last week I discussed the plan for merging the LinearTypes branch into
> GHC 8.12 with Arnaud, Richard, Andreas, and Simon. Many thanks to all of
> them for their respective roles in pushing this patch over the finish
> line.
>
> One thing that we wanted to examine prior to merge is compiler
> performance across a larger collection of packages. For this I used
> the head.hackage patch-set, comparing the Linear Types branch with its
> corresponding base commit in `master`. Here I will describe the
> methodology used for this comparison and briefly summarize the (happily,
> quite positive) results.
>
>
> # Methodology
>
> I collected total bytes allocated (as reported by the runtime system),
> elapsed runtime (as reported by the runtime system), and instructions
> (as reported by `perf stat`) of head.hackage builds in two
> configurations:
>
>  * the `opt` configuration
>  * the `noopt` configuration, which passed `--disable-optimisation` to
> cabal-install
>
> These configurations were evaluated on two commits:
>
>  * `master`: 2b792facab46f7cdd09d12e79499f4e0dcd4293f
>  * `linear-bang`: 481cf412d6e619c0e47960f4c70fb21f19d6996d
>
> Unfortunately, the `noopt` configuration appears to be be affected by a
> few cabal-install bugs [1,2] and consequently some packages may *still*
> be compiled with optimisation, so take these numbers with a grain of
> salt.
>
> The test environment was a reasonably quiet Ryzen 7 1800X with 32 GBytes
> of RAM.
>
> The test was run by first building the two tested commits in Hadrian's
> default build flavour. The head.hackage CI driver was then invoked as
> follows:
>
>     # Don't parallelize for stable performance measurements
>     export CPUS=1
>     export USE_NIX=1
>     export EXTRA_HC_OPTS=-ddump-timings
>     export COLLECT_PERF_STATS=1
>
>     mkdir -p runs
>
>     # master
>     export GHC=/home/ben/ghc/ghc-compare-2/_build/stage1/bin/ghc
>     ./run-ci --cabal-option=--disable-optimisation
>     mv ci/run runs/master-noopt
>     ./run-ci
>     mv ci/run runs/master-opt
>
>     # linear-bang
>     export GHC=/home/ben/ghc/ghc-compare-1/_build/stage1/bin/ghc
>     ./run-ci --cabal-option=--disable-optimisation
>     mv ci/run runs/linear-noopt
>     ./run-ci
>     mv ci/run runs/linear-opt
>
> As we are building all packages (nearly 300 in total) serially, the full
> run takes quite a while (around 8 hours IIRC).
>
> The final run of this test used head.hackage commit
> e7e5c5cfbfd42c41b1e62d42bb18483a83b78701 (on the `rts-stats` branch).
>
>
> # Results
>
> I examined several different metrics of compiler performance
>
>  * the total_wall_seconds RTS metric gives an picture of overall
>    compilation effort
>
>  * time reported by -ddump-timings, summed by module, gives a slightly
>    finer-grained measurement of per-module compilation time
>
>  * the RTS's bytes_allocated metric gives overall compiler allocations
>
>  * the RTS"s max_bytes_used metric gives a sense of AST size (and
>    potentially the existence of leaks)
>
> To cut straight to the chase, the measurements show the following:
>
>   metric                      -O0                -O1
>   -------------------         ---------          ----------
>   total_wall_seconds          +0.3%              +0.6%
>   total_cpu_seconds           +0.3%              +0.7%
>   max_bytes_used              +4.2%              +4.8%
>   GC_cpu_seconds              +1.5%              +2.1%
>   mut_cpu_seconds             no change          no change
>   sum(per-module-time)        +4.2%              +4.2%
>   sum(per-module-alloc)       +0.8%              +0.8%
>
> There are a few things to point out here: the overall change in compiler
> runtime is thankfully quite reasonable. However, max_bytes_used
> increases rather considerably. This seems to give rise to an appreciable
> regression in GC time. It would be interesting to know whether this can
> be improved with optimisation to data representation.
>
> The fact that the cumulative per-module metrics didn't change between
> -O0 and -O1 indicate
> to me that there is a methodological problem which needs to be addressed
> in the test infrastructure. I investigated this a bit and have a
> hypothesis for what might be going on here; nevertheless, in the
> interest of publishing these measurements I'm ignoring these
> measurements for the time being.
>
> I have attached the Jupyter notebook that gave rise to these numbers.
> This gives a finer-grained breakdown of the data including histograms
> showing the variance of each metric. Perhaps this will be helpful in
> better understanding the effects. I would be happy to share my run data
> as well although it is a bit large.
>
> All-in-all, the Tweag folks have done a great job in squashing the
> performance numbers noticed a few weeks ago. The current numbers look quite
> acceptable for GHC 8.12. Congratulations to Arnaud, Krzysztof, and
> Richard on landing this feature! I'm very much looking forward to see
> what the community does with it in the coming years.
>
> Cheers,
>
> - Ben
>
>
> [1] https://github.com/haskell/cabal/issues/5353
> [2] https://github.com/haskell/cabal/issues/3883
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20200617/bb9702b8/attachment.html>