Benchmarking experiences: Cabal test vs compiling nofib/spectral/simple/Main.hs

Ben Gamari ben at
Sat Jan 23 16:36:34 UTC 2021

Sebastian Graf <sgraf1337 at> writes:

> Hi Andreas,
> I similarly benchmark compiler performance by compiling Cabal, but only
> occasionally. I mostly trust ghc/alloc metrics in CI and check Cabal when I
> think there's something afoot and/or want to measure runtime, not only
> allocations.
I think this is a very reasonable strategy. When working explicitly on
compiler performance I generally default to the Cabal test as

 1. I find the 20 or 90 seconds (depending upon optimisation level) that
    it takes is small relative to the time it took to actually find the
    issue I am trying to fix, and

 2. I want to be certain I am not sacrificing compiler performance in
    one case in exchange for improvements elsewhere; the nofib tests are so
    small that I find it hard to convince myself that this is the case.

> I'm inclined to think that for my purposes (testing the impact of
> optimisations) the GHC codebase offers sufficient variety to turn up
> fundamental regressions, but maybe it makes sense to build some packages
> from head.hackage to detect regressions like
> earlier. It's all a bit
> open-ended and I frankly think I wouldn't get done anything if all my
> patches would have to get to the bottom of all regressions and improvements
> on the entire head.hackage set. I somewhat trust that users will complain
> eventually and file a bug report and that our CI efforts mean that compiler
> performance will improve in the mean.
> Although it's probably more of a tooling problem: I simply don't know how
> to collect the compiler performance metrics for arbitrary cabal packages.
> If these metrics would be collected as part of CI, maybe as a nightly or
> weekly job, it would be easier to get to the bottom of a regression before
> it manifests in a released GHC version. But it all depends on how easy that
> would be to set up and how many CI cycles it would burn, and I certainly
> don't feel like I'm in a position to answer either question.
We actually already do this in head.hackage: every GHC commit on
`master` runs `head.hackage` with -ddump-timings. The compiler metrics
that result are then dumped into a database, which can be queried via
Postgrest. IIRC, I described this in an email to ghc-devs a few months

Unfortunately, Ryan and I have thusfar found it very difficult to
keep head.hackage and the associated infrastructure building reliably
enough to make this a useful long-term metric. I do hope we can do
better in the future; I suspect we will want to be better about marking
MRs that may break user code with ~"user facing", allowing us to ensure
that head.hackage is updated *before* the change makes it into `master`.


- Ben

