[GHC] #15936: Rethink Choice of Baseline Commit for Performance Tests

Fri Nov 23 11:32:47 UTC 2018

#15936: Rethink Choice of Baseline Commit for Performance Tests
-------------------------------------+-------------------------------------
        Reporter:  davide            |                Owner:  davide
            Type:  task              |               Status:  new
        Priority:  normal            |            Milestone:  8.6.3
       Component:  Test Suite        |              Version:  8.6.2
      Resolution:                    |             Keywords:  performance
                                     |  tests git notes
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Description changed by davide:

Old description:

> = Intro =
>
> Currently we always use the previous commit when running performance
> tests. This works well in CI where we fully test each commit in sequence
> (and hence always have test results for the previous commit). Remember,
> test results are stored in git notes and are not by default shared
> between repositories (i.e. your local repo will only have performance
> results run locally on your machine). This is by design: we want to avoid
> comparing results form different machines.
>
> Unfortunately This is not so effective when testing locally. The
> programmer may have only run a subset of performance tests on the
> previous commit, and often have not run the tests at all (this is notably
> true after performing a rebase: the previous commit has changed). We need
> to rethink how we pick a baseline commit.
>
> = Goals =
>
> * In all cases, do something sensible.
> * Giving a warning if conditions are not idea. Provide clear and simple
> instructions on how to get to the ideal case.
> * Give control over the baseline commit to the programmer via command
> line options.
> * In general, performance tests should just work! No extra knowledge
> needed by the programmer.
> * If tests pass without warning now, then they should pass without
> warning later.
>
> = Proposed Solution =
>
> * Choose baseline commit
>   * Provide command line arguments to set the baseline commit
>   * If not baseline commit is provided use the "Ideal baseline commit".
> * If local test results for the baseline are available use those results.
> * Else if results from CI are available (without fetching), then use
> those results.
> * Else Warn user that there are no results available (tests trivially
> pass), then suggest fetching CI results (quick and easy, give full
> command, mention you may have to wait for CI to finish if the commit is
> recent) or running locally (more accurate, mention exact commit to
> checkout), or manually select a baseline commit.
>
> == Ideal baseline commit ==
>
> * If there are no new changes: 0 ahead and 0 or more behind master.
>   * Ideal baseline commit is the previous commit.
> * Else 1 or more ahead and 0 or more behind master.
>   * Ideal baseline commit is `merge-base master HEAD`
>   * Assume that the intention is to create a patch where all new commits
> will ultimately be squashed and placed on top of master. We only want to
> consider performance changes caused by the new commits, so we use the
> merge base instead of master HEAD (though these may be the same commit).
>
> = Open issues =
>
> If commits between HEAD and the baseline commit
> [https://ghc.haskell.org/trac/ghc/wiki/Performance/Tests#ExpectedPerformanceChanges
> allow changes] in performance numbers, what should the behaviour be? What
> happens if this is merged? The commits will be squashed: this could
> affect the commit message and hence the allowed changes which is parsed
> from the commit messages.
>
> = Use cases =
>
> TODO This section is a WIP
>
> We generally will assume that the master branch corresponds to ghc's
> master branch (though may be slightly out of date).
>
> ||= Case =||= Details =||= Baseline =||
> || 0 commits ahead of master || May be on master or branched off master
> with no new commits || previous commit ||
> || CI pre-merge ||
> || CI post-merge

New description:

 = Intro =

 Currently we always use the previous commit when running performance
 tests. This works well in CI where we fully test each commit in sequence
 (and hence always have test results for the previous commit). Remember,
 test results are stored in git notes and are not by default shared between
 repositories (i.e. your local repo will only have performance results run
 locally on your machine). This is by design: we want to avoid comparing
 results form different machines.

 Unfortunately This is not so effective when testing locally. The
 programmer may have only run a subset of performance tests on the previous
 commit, and often have not run the tests at all (this is notably true
 after performing a rebase: the previous commit has changed). We need to
 rethink how we pick a baseline commit.

 = Goals =

 * In all cases, do something sensible.
 * Giving a warning if conditions are not idea. Provide clear and simple
 instructions on how to get to the ideal case.
 * Give control over the baseline commit to the programmer via command line
 options.
 * Give control over the baseline of local or ci to the programmer via
 command line options.
 * In general, performance tests should just work! No extra knowledge
 needed by the programmer.
 * If tests pass without warning now, then they should pass without warning
 later.

 = Proposed Solution =

 * Choose baseline commit
   * Provide command line arguments to set the baseline commit
   * If not baseline commit is provided use the "Ideal baseline commit".
 * If local test results for the baseline are available use those results.
 * Else if results from CI are available (without fetching), then use those
 results.
 * Else Warn user that there are no results available (tests trivially
 pass), then suggest fetching CI results (quick and easy, give full
 command, mention you may have to wait for CI to finish if the commit is
 recent) or running locally (more accurate, mention exact commit to
 checkout), or manually select a baseline commit.

 == Ideal baseline commit ==

 * If there are no new changes: 0 ahead and 0 or more behind master.
   * Ideal baseline commit is the previous commit.
 * Else 1 or more ahead and 0 or more behind master.
   * Ideal baseline commit is `merge-base master HEAD`
   * Assume that the intention is to create a patch where all new commits
 will ultimately be squashed and placed on top of master. We only want to
 consider performance changes caused by the new commits, so we use the
 merge base instead of master HEAD (though these may be the same commit).

 An easy way to implement this is:
 {{{
 mergeBase = merge-base master HEAD
 baseline = if mergeBase == HEAD
              then HEAD^
              else mergeBase
 }}}

 = Open issues =

 If commits between HEAD and the baseline commit
 [https://ghc.haskell.org/trac/ghc/wiki/Performance/Tests#ExpectedPerformanceChanges
 allow changes] in performance numbers, what should the behaviour be? What
 happens if this is merged? The commits will be squashed: this could affect
 the commit message and hence the allowed changes which is parsed from the
 commit messages.

 TODO assume programmer only adds such annotations as tests fail, and
 doesn't want to renter them in full. We still test against baseline.
 Combine all allowed changes, prefering latest when there is overlap (I
 think thats right... think about adding a commit that decreases a metric,
 then you add another commit that increases it (compared to baseline), then
 overall this is an increase and we can ignore the intermediate decrease,
 thanks to the commits ultimately being squashed). Warn about what to put
 in squashed commit.

 We must figure out what commit messages will be used in GitLab on merge.

 = Use cases =

 * We do not distinguish between full/partial performance results being
 available for the baseline commit: that would require checking out the
 baseline commit and extracting the full list of tests.
 *

 == Locally validate a commit from master ==

 {{{
 git checkout master~5
 ./validate
 }}}

 Baseline Commit: HEAD^ == master~6

 || BaselinelocalResults || BaselineCIResults || Infos || Warnings ||

 ||||= Case =||||||= Behaviour =||
 ||= Baseline local Results? =||= Baseline CI Results? =||= Baseline
 local/CI =||= Infos =||= Warnings =||
 || Yes/Partial || - || Local || If HEAD tests is not subset or eq to
 Baseline tests: "If relevant tests exist on baseline, checkout baseline
 and running those tests OR fetch notes and use --baseline-ci || Warnings
 ||
 || No || Yes || CI || "Using CI numbers, suggest running locally for more
 accurate results" || Warnings ||
 || No || No || - ||  || "No baseline results, tests trivially pass" +
 suggest fetch notes or locally run tests on baseline ||

 == Locally validate a commit from master ==

 {{{
 git checkout master~5
 ./validate
 }}}

 Baseline Commit: HEAD^ == master~6

 || BaselinelocalResults || BaselineCIResults || Infos || Warnings ||

 ||||= Case =||||||= Behaviour =||
 ||= Baseline local Results? =||= Baseline CI Results? =||= Baseline
 local/CI =||= Infos =||= Warnings =||
 || Yes/Partial || - || Local || If HEAD tests is not subset or eq to
 Baseline tests: "If relevant tests exist on baseline, checkout baseline
 and running those tests OR fetch notes and use --baseline-ci || Warnings
 ||
 || No || Yes || CI || "Using CI numbers, suggest running locally for more
 accurate results" || Warnings ||
 || No || No || - ||  || "No baseline results, tests trivially pass" +
 suggest fetch notes or locally run tests on baseline ||

 = From the perspective of the CI =

 ?? From CI, "local" is actually "CI". SO replace "is CI results available"
 with "no" and replace "Is local results available" with "is CI results
 available"

 = When to automatically fetch CI results? =

 If baseline commit doesn't have local nor CI results, and is old enough
 such that we expect CI to have been run (WARNING we would need to know the
 merge time, not the time that the commit was created, which could be long
 before it was merged? Or will GitLab bump the commit time on merge?)

--

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15936#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler