Beta Performance dashboard

Wed Jul 16 09:42:02 UTC 2014

This is great. I wanted this for a long time.

Joachim, could you write a wiki page with step-by-step instructions for how
to set this up, detailed enough that e.g. one of our infrastructure
volunteers could set it up on another machine.

Haskell infrastructure people, do we have a (e.g. Hetzner) machine that we
can run this on?

On Wed, Jul 16, 2014 at 10:02 AM, Joachim Breitner <mail at joachim-breitner.de
> wrote:

> Hi,
>
> I guess it’s time to talk about this, especially as Richard just brought
> it up again...
>
> I felt that we were seriously lacking in our grip on performance issues.
> We don’t even know whether 6.8.3 was better or worse than 6.8.3 or 7.6.4
> in terms of nofib, not to speak of the effect of each single commit.
>
> I want to change that, so I set up a benchmark monitoring dashboard. You
> can currently reach it at:
>
>                   http://ghcspeed-nomeata.rhcloud.com/
>
> What does it do?
> ~~~~~~~~~~~~~~~~
>
> It monitors the repository (master branch only) and builds each commit,
> complete with the test suite and nofib. The log is saved and analyzed,
> and some numbers are extracted:
>  * The build time
>  * The test suite summary numbers
>  * Runtime (if >1s), allocations and binary sizes of the nofib
>    benchmarks
>
> These are uploaded to the website above, which is powered by codespeed,
> a general performance dashboard, implemented in Python using Django.
>
> Under _Changes_, it provides a report for each commit (changes wrt. to
> the previous version, and wrt. to 10 revisions earlier, the so-called
> “trend”). A summary of these reports is visible on the front-page.
>
> The _Timeline_ is a graph for each individual performance number. If
> there are bumps, you can hopefully find them there! You can also compare
> to 7.8.3, which is available as a “baseline”.
>
> _Comparison_ will be more useful if we have more tagged revision, or if
> were benchmarking various options (e.g. -fllvm): Here you can do
> bar-chart comparisons.
>
> Why codespeed?
> ~~~~~~~~~~~~~~
>
> For a long time I searched for a suitable software product, and one
> criterion is that it should be open source, rather simple to set up and
> mostly decoupled from other tools, i.e. something that I throw numbers
> at and which then displays them nicely. While I don’t think codespeed is
> the best performance dashboard out there (I find
> http://goperfd.appspot.com/perf a bit better; I wonder how well
> codespeed scales to even larger numbers of benchmarks and I wish it were
> more git-aware), it was the easiest to get started with. And thanks to
> the loose coupling of (1) running the tests to acquire a log, (2)
> parsing the log to get numbers and (3) putting them on a server, we can
> hopefully replace it when we come along something better. I was hoping
> for the Phabricator guys to have something in their tool suite, but
> doesn’t look like it.
>
> How does it work (currently)?
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> My office PC is underused (I work on my laptop), so its currently
> dedicated to it. I have a simple shell script that monitors the repo for
> new versions. It builds the newest revision and works itself back to the
> commit where everything was turned into submodules:
> https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/watch.sh
>
> It calls a script that does the actual building:
> https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/run-speed.sh
> This produces a log file which should contain all the required numbers
> somewhere.
>
> A second script extracts these numbers (with help of nofib-analyze) and
> converts them into codespeed compatible JSON files:
> https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/log2json.pl
>
> Finally, a simple invocation to curl uploads them to codespeed:
> https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/upload.sh
>
> So if you want additional benchmarks to be tracked, make sure they are
> present in the logs and adjust log2json.pl. codespeed will automatically
> pick up new benchmarks in these logs. Reimplementations in Haskell are
> also welcome :-)
>
> The testsuite is run with VERBOSE=4, so the performance numbers are also
> shown for failing test cases. So once a test case goes over the limit,
> you can grep through previous logs try to find the real culprit. I
> uploaded the logs (so far) to https://github.com/nomeata/ghc-speed-logs
> (but this is not automated yet, ping me if you need an update on this).
>
> What next?
> ~~~~~~~~~~
>
> Clearly, the current setup is only good enough to evaluate the system.
> Eventually, I might want to use my office PC again, and the free hosting
> on openshift is not very powerful.
>
> So if we want to keep this setup and make it “official”, we need find a
> permanent solution.¹ This involves:
>
>  * A dedicated machine to run the benchmarks. This probably shouldn’t be
>    a VM, if we want to keep the noise in the runtime down.
>  * A machine to run the codespeed server. Can be a VM, or even run on
>    any of the system that we have right now. Just needs a database
>    (postgresql preferably) and a webserver supporting WSGI (i.e. any
>    of them).
>  * Maybe a better place to store the logs for public consumption.
>
> Also, there are way to improve the system:
>
>  * As I said, I don’t think codespeed is the best. If we find something
>    better, we can replace it. Since we have all the logs, we can easily
>    fill the new system with the data, or even run both at the same time.
>  * We might want to have more numbers. I am already putting
>    lines-of-code and disk space usage numbers into the logs, but do not
>    parse them yet.
>  * In particular, we might want to put in each performance test case as
>    a benchmark of its own, to easier find commits that degrade (or
>    improve!) performance. I’m not sure how well the web page will handle
>    that.
>  * We might want to replace my rather simple watch.sh-script by
>    something more serious. In particular, I imagine that our builder
>    setup could manages this, with a dedicated builder doing the
>    benchmark runs and the builder server scheduling a build for each
>    commit.
>
>
> That’s it for now. Enjoy clicking around!
>
> Greetings,
> Joachim
>
> ¹ I guess that could be considered beta-reduction :-)
>
>
>
> --
> Joachim Breitner
>   e-Mail: mail at joachim-breitner.de
>   Homepage: http://www.joachim-breitner.de
>   Jabber-ID: nomeata at joachim-breitner.de
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140716/5a5428fe/attachment-0001.html>