nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Thu Feb 7 10:57:36 CET 2013

Hi all,

On 07 Feb 2013, at 10:44, Simon Marlow <marlowsd at gmail.com> wrote:

> On 06/02/13 22:26, Andy Georges wrote:
>> Quantifying performance changes with effect size confidence intervals - Tomas Kalibera and Richard Jones, 2012 (tech report)
> 
> This is a good one - it was actually a talk by Richard Jones that highlighted to me the problems with averaging over benchmarks (aside from the problem with GM, which he didn't mention).

The paper has a guide for practitioners that improves on what I did in part of my PhD. I think it could be fairly easy to wrap that around Criterion for comparing runs -- most of your . I should note that a number of people I know are involved in performance measurement think it is a bit too detailed, but if you can implement this in your testing framework, it could be a cool feature that other people start using too. 

> This paper mentions Criterion, incidentally.

Yes :-) I mentioned it several times when we discussed performance measuring in the Evaluate workshops. Since I changed jobs, I am no longer very actively involved here, but some people seem to have picked things up, I guess.

>> • [[1]] J.E., Smith. Characterizing computer performance with a single number. CACM 31(10), 1988.
> 
> And I wish I'd read this a long time ago :)  Thanks.  No more geometric means for me!

You are very welcome. 

Regards,
-- Andy