A small but useful tool for performance characterisation
Ben Gamari
ben at well-typed.com
Sun Jan 5 00:37:00 UTC 2020
Hi everyone,
I have recently been doing a fair amount of performance characterisation
and have long wanted a convenient means of collecting GHC runtime
statistics for later analysis. For this I quickly developed a small
wrapper utility [1].
To see what it does, let's consider an example. Say we made a change to
GHC which we believe might affect the runtime performance of Program.hs.
We could quickly check this by running,
$ ghc-before/_build/stage1/bin/ghc -O Program.hs
$ ghc_perf.py -o before.json ./Program
$ ghc-before/_build/stage1/bin/ghc -O Program.hs
$ ghc_perf.py -o after.json ./Program
This will produce two files, before.json and after.json, which contain
the various runtime statistics emitted by +RTS -s --machine-readable.
These files are in the same format as is used by my nofib branch [2] and
therefore can be compared using `nofib-compare` from that branch.
In addition to being able to collect runtime metrics, ghc_perf is also
able to collect performance counters (on Linux only) using perf. For
instance,
$ ghc_perf.py -o program.json \
-e instructions,cycles,cache-misses ./Program
will produce program.json containing not only RTS statistics but also
event counts from the perf instructions, cycles, and cache-misses
events. Alternatively, passing simply `ghc_perf.py --perf` enables a
reasonable default set of events (namely instructions, cycles,
cache-misses, branches, and branch-misses).
Finally, ghc_perf can also handle repeated runs. For instance,
$ ghc_perf.py -o program.json -r 5 --summarize \
-e instructions,cycles,cache-misses ./Program
will run Program 5 times, emit all of the collected samples to
program.json, and produce a (very basic) statistical summary of what it
collected on stdout.
Note that there are a few possible TODOs that I've been considering:
* I chose JSON as the output format to accomodate structured data (e.g.
capture experimental parameters in a structured way). However, in
practice this choice has lead to significantly more inconvenience
than I would like, especially given that so far I've only used the
format to capture basic key/value pairs. Perhaps reverting to CSV
would be preferable.
* It might be nice to also add support for cachegrind.
Anyways, I hope that others find this as useful as I have.
Cheers,
- Ben
[1] https://gitlab.haskell.org/bgamari/ghc-utils/blob/master/ghc_perf.py
[2] https://gitlab.haskell.org/ghc/nofib/merge_requests/24
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20200104/9da14fef/attachment.sig>
More information about the ghc-devs
mailing list