Beta Performance dashboard

Wed Jul 16 08:02:21 UTC 2014

Hi,

I guess it’s time to talk about this, especially as Richard just brought
it up again...

I felt that we were seriously lacking in our grip on performance issues.
We don’t even know whether 6.8.3 was better or worse than 6.8.3 or 7.6.4
in terms of nofib, not to speak of the effect of each single commit.

I want to change that, so I set up a benchmark monitoring dashboard. You
can currently reach it at:

                  http://ghcspeed-nomeata.rhcloud.com/

What does it do?
~~~~~~~~~~~~~~~~

It monitors the repository (master branch only) and builds each commit,
complete with the test suite and nofib. The log is saved and analyzed,
and some numbers are extracted: 
 * The build time
 * The test suite summary numbers
 * Runtime (if >1s), allocations and binary sizes of the nofib
   benchmarks

These are uploaded to the website above, which is powered by codespeed,
a general performance dashboard, implemented in Python using Django.

Under _Changes_, it provides a report for each commit (changes wrt. to
the previous version, and wrt. to 10 revisions earlier, the so-called
“trend”). A summary of these reports is visible on the front-page.

The _Timeline_ is a graph for each individual performance number. If
there are bumps, you can hopefully find them there! You can also compare
to 7.8.3, which is available as a “baseline”.

_Comparison_ will be more useful if we have more tagged revision, or if
were benchmarking various options (e.g. -fllvm): Here you can do
bar-chart comparisons.

Why codespeed?
~~~~~~~~~~~~~~

For a long time I searched for a suitable software product, and one
criterion is that it should be open source, rather simple to set up and
mostly decoupled from other tools, i.e. something that I throw numbers
at and which then displays them nicely. While I don’t think codespeed is
the best performance dashboard out there (I find
http://goperfd.appspot.com/perf a bit better; I wonder how well
codespeed scales to even larger numbers of benchmarks and I wish it were
more git-aware), it was the easiest to get started with. And thanks to
the loose coupling of (1) running the tests to acquire a log, (2)
parsing the log to get numbers and (3) putting them on a server, we can
hopefully replace it when we come along something better. I was hoping
for the Phabricator guys to have something in their tool suite, but
doesn’t look like it.

How does it work (currently)?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

My office PC is underused (I work on my laptop), so its currently
dedicated to it. I have a simple shell script that monitors the repo for
new versions. It builds the newest revision and works itself back to the
commit where everything was turned into submodules:
https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/watch.sh

It calls a script that does the actual building:
https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/run-speed.sh
This produces a log file which should contain all the required numbers
somewhere.

A second script extracts these numbers (with help of nofib-analyze) and
converts them into codespeed compatible JSON files:
https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/log2json.pl

Finally, a simple invocation to curl uploads them to codespeed:
https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/upload.sh

So if you want additional benchmarks to be tracked, make sure they are
present in the logs and adjust log2json.pl. codespeed will automatically
pick up new benchmarks in these logs. Reimplementations in Haskell are
also welcome :-)

The testsuite is run with VERBOSE=4, so the performance numbers are also
shown for failing test cases. So once a test case goes over the limit,
you can grep through previous logs try to find the real culprit. I
uploaded the logs (so far) to https://github.com/nomeata/ghc-speed-logs
(but this is not automated yet, ping me if you need an update on this).

What next?
~~~~~~~~~~

Clearly, the current setup is only good enough to evaluate the system.
Eventually, I might want to use my office PC again, and the free hosting
on openshift is not very powerful.

So if we want to keep this setup and make it “official”, we need find a
permanent solution.¹ This involves:

 * A dedicated machine to run the benchmarks. This probably shouldn’t be
   a VM, if we want to keep the noise in the runtime down.
 * A machine to run the codespeed server. Can be a VM, or even run on 
   any of the system that we have right now. Just needs a database 
   (postgresql preferably) and a webserver supporting WSGI (i.e. any 
   of them).
 * Maybe a better place to store the logs for public consumption.

Also, there are way to improve the system:

 * As I said, I don’t think codespeed is the best. If we find something
   better, we can replace it. Since we have all the logs, we can easily
   fill the new system with the data, or even run both at the same time.
 * We might want to have more numbers. I am already putting
   lines-of-code and disk space usage numbers into the logs, but do not
   parse them yet.
 * In particular, we might want to put in each performance test case as
   a benchmark of its own, to easier find commits that degrade (or 
   improve!) performance. I’m not sure how well the web page will handle
   that.
 * We might want to replace my rather simple watch.sh-script by 
   something more serious. In particular, I imagine that our builder
   setup could manages this, with a dedicated builder doing the
   benchmark runs and the builder server scheduling a build for each
   commit.

That’s it for now. Enjoy clicking around!

Greetings,
Joachim

¹ I guess that could be considered beta-reduction :-)

-- 
Joachim Breitner
  e-Mail: mail at joachim-breitner.de
  Homepage: http://www.joachim-breitner.de
  Jabber-ID: nomeata at joachim-breitner.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140716/e3e7be0f/attachment.sig>