nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2
itkovian at gmail.com
Thu Feb 7 10:57:36 CET 2013
On 07 Feb 2013, at 10:44, Simon Marlow <marlowsd at gmail.com> wrote:
> On 06/02/13 22:26, Andy Georges wrote:
>> Quantifying performance changes with effect size confidence intervals - Tomas Kalibera and Richard Jones, 2012 (tech report)
> This is a good one - it was actually a talk by Richard Jones that highlighted to me the problems with averaging over benchmarks (aside from the problem with GM, which he didn't mention).
The paper has a guide for practitioners that improves on what I did in part of my PhD. I think it could be fairly easy to wrap that around Criterion for comparing runs -- most of your . I should note that a number of people I know are involved in performance measurement think it is a bit too detailed, but if you can implement this in your testing framework, it could be a cool feature that other people start using too.
> This paper mentions Criterion, incidentally.
Yes :-) I mentioned it several times when we discussed performance measuring in the Evaluate workshops. Since I changed jobs, I am no longer very actively involved here, but some people seem to have picked things up, I guess.
>> • [] J.E., Smith. Characterizing computer performance with a single number. CACM 31(10), 1988.
> And I wish I'd read this a long time ago :) Thanks. No more geometric means for me!
You are very welcome.
More information about the ghc-devs