Beta Performance dashboard

Simon Peyton Jones simonpj at
Thu Jul 17 11:21:30 UTC 2014

This is totally brilliant: thank you Joachim!


| -----Original Message-----
| From: ghc-devs [mailto:ghc-devs-bounces at] On Behalf Of
| Joachim Breitner
| Sent: 16 July 2014 09:02
| To: ghc-devs at
| Subject: Beta Performance dashboard
| Hi,
| I guess it’s time to talk about this, especially as Richard just
| brought it up again...
| I felt that we were seriously lacking in our grip on performance
| issues.
| We don’t even know whether 6.8.3 was better or worse than 6.8.3 or
| 7.6.4 in terms of nofib, not to speak of the effect of each single
| commit.
| I want to change that, so I set up a benchmark monitoring dashboard.
| You can currently reach it at:
| What does it do?
| ~~~~~~~~~~~~~~~~
| It monitors the repository (master branch only) and builds each commit,
| complete with the test suite and nofib. The log is saved and analyzed,
| and some numbers are extracted:
|  * The build time
|  * The test suite summary numbers
|  * Runtime (if >1s), allocations and binary sizes of the nofib
|    benchmarks
| These are uploaded to the website above, which is powered by codespeed,
| a general performance dashboard, implemented in Python using Django.
| Under _Changes_, it provides a report for each commit (changes wrt. to
| the previous version, and wrt. to 10 revisions earlier, the so-called
| “trend”). A summary of these reports is visible on the front-page.
| The _Timeline_ is a graph for each individual performance number. If
| there are bumps, you can hopefully find them there! You can also
| compare to 7.8.3, which is available as a “baseline”.
| _Comparison_ will be more useful if we have more tagged revision, or if
| were benchmarking various options (e.g. -fllvm): Here you can do bar-
| chart comparisons.
| Why codespeed?
| ~~~~~~~~~~~~~~
| For a long time I searched for a suitable software product, and one
| criterion is that it should be open source, rather simple to set up and
| mostly decoupled from other tools, i.e. something that I throw numbers
| at and which then displays them nicely. While I don’t think codespeed
| is the best performance dashboard out there (I find
| a bit better; I wonder how well
| codespeed scales to even larger numbers of benchmarks and I wish it
| were more git-aware), it was the easiest to get started with. And
| thanks to the loose coupling of (1) running the tests to acquire a log,
| (2) parsing the log to get numbers and (3) putting them on a server, we
| can hopefully replace it when we come along something better. I was
| hoping for the Phabricator guys to have something in their tool suite,
| but doesn’t look like it.
| How does it work (currently)?
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| My office PC is underused (I work on my laptop), so its currently
| dedicated to it. I have a simple shell script that monitors the repo
| for new versions. It builds the newest revision and works itself back
| to the commit where everything was turned into submodules:
| It calls a script that does the actual building:
| This produces a log file which should contain all the required numbers
| somewhere.
| A second script extracts these numbers (with help of nofib-analyze) and
| converts them into codespeed compatible JSON files:
| Finally, a simple invocation to curl uploads them to codespeed:
| So if you want additional benchmarks to be tracked, make sure they are
| present in the logs and adjust codespeed will
| automatically pick up new benchmarks in these logs. Reimplementations
| in Haskell are also welcome :-)
| The testsuite is run with VERBOSE=4, so the performance numbers are
| also shown for failing test cases. So once a test case goes over the
| limit, you can grep through previous logs try to find the real culprit.
| I uploaded the logs (so far) to
| logs
| (but this is not automated yet, ping me if you need an update on this).
| What next?
| ~~~~~~~~~~
| Clearly, the current setup is only good enough to evaluate the system.
| Eventually, I might want to use my office PC again, and the free
| hosting on openshift is not very powerful.
| So if we want to keep this setup and make it “official”, we need find a
| permanent solution.¹ This involves:
|  * A dedicated machine to run the benchmarks. This probably shouldn’t
| be
|    a VM, if we want to keep the noise in the runtime down.
|  * A machine to run the codespeed server. Can be a VM, or even run on
|    any of the system that we have right now. Just needs a database
|    (postgresql preferably) and a webserver supporting WSGI (i.e. any
|    of them).
|  * Maybe a better place to store the logs for public consumption.
| Also, there are way to improve the system:
|  * As I said, I don’t think codespeed is the best. If we find something
|    better, we can replace it. Since we have all the logs, we can easily
|    fill the new system with the data, or even run both at the same
| time.
|  * We might want to have more numbers. I am already putting
|    lines-of-code and disk space usage numbers into the logs, but do not
|    parse them yet.
|  * In particular, we might want to put in each performance test case as
|    a benchmark of its own, to easier find commits that degrade (or
|    improve!) performance. I’m not sure how well the web page will
| handle
|    that.
|  * We might want to replace my rather simple by
|    something more serious. In particular, I imagine that our builder
|    setup could manages this, with a dedicated builder doing the
|    benchmark runs and the builder server scheduling a build for each
|    commit.
| That’s it for now. Enjoy clicking around!
| Greetings,
| Joachim
| ¹ I guess that could be considered beta-reduction :-)
| --
| Joachim Breitner
|   e-Mail: mail at
|   Homepage:
|   Jabber-ID: nomeata at

More information about the ghc-devs mailing list