<html><head>

<meta content="text/html; charset=ISO-8859-15" http-equiv="Content-Type">

</head><body text="#000000" bgcolor="#FFFFFF"><span>Joachim Breitner 

schrieb:</span><br>

<blockquote 

cite="mid:%3C084b09cd2fb905d07dbbacc8a3a099c26dd38ec2.camel@joachim-breitner.de%3E"

 type="cite">

  <pre wrap="">This runs on a dedicated physical machine, and still the run-time

numbers were varying too widely and gave us many false warnings (and

probably reported many false improvements which we of course were happy

to believe). I have since switched to measuring only dynamic

instruction counts with valgrind. This means that we cannot detect

improvement or regressions due to certain low-level stuff, but we gain

the ability to reliably measure *something* that we expect to change

when we improve (or accidentally worsen) the high-level

transformations.

</pre>

</blockquote>

<span>While this matches my experience with the default settings, I had 

good results by tuning the number of measurements nofib does.<br>

With a high number of NoFibRuns (30+) , disabling frequency scaling, 

stopping background tasks and walking away from the computer<br>

till it was done I got noise down to differences of about +/-0.2% for 

subsequent runs.<br>

  <br>

This doesn't eliminate alignment bias and the like but at least it gives

 fairly reproducible results.<br>

  <br>

Sven Panne schrieb:</span><br>

<blockquote 

cite="mid:CANBN=mvwgDi6rBb6rNH2Onhkpt0m=6ZJR40ngdQiu+oFss4zUg@mail.gmail.com"

 type="cite">

  <div>4% is far from being "big", look e.g. at <a 

moz-do-not-send="true" 

href="https://dendibakh.github.io/blog/2018/01/18/Code_alignment_issues"

 target="_blank">https://dendibakh.github.io/<wbr>blog/2018/01/18/Code_<wbr>alignment_issues</a>

 where changing just the alignment of the code lead to a 10% difference.

 :-/ The code itself or its layout wasn't changed at all. The "Producing

 Wrong Data Without Doing Anything Obviously Wrong!" paper gives more 

funny examples.<br></div>

  <div><br></div>

  <div>I'm not saying that code layout has no impact, quite the 

opposite. The <span __postbox-detected-content="__postbox-detected-date"

 class="__postbox-detected-content __postbox-detected-date" 

displaystyle="display: inline; font-size: inherit; padding: 0pt; color: 

rgb(0, 0, 0);">main</span>

 point is: Do we really have a benchmarking machinery in place which can

 tell you if you've improved the real run time or made it worse? I doubt

 that, at least at the scale of a few percent. To reach just that simple

 yes/no conclusion, you would need quite a heavy machinery involving 

randomized linking order, varying environments (in the sense of "number 

and contents of environment variables"), various CPU models etc. If you 

do not do that, modern HW will leave you with a lot of "WTF?!" moments 

and wrong conclusions.</div>

</blockquote>

You raise good points. While the example in the blog seems a bit 

constructed with the whole loop fitting in a cache line the principle is

 a real concern though.<br>

I've hit alignment issues and WTF moments plenty of times in the past 

when looking at micro benchmarks.<br>

<br>

However on the scale of nofib so far I haven't really seen this happen. 

It's good to be aware of the chance for a whole suite to give<br>

wrong results though.<br>

I wonder if this effect is limited by GHC's tendency to use 8 byte 

alignment for all code (at least with tables next to code)?<br>

If we only consider 16byte (DSB Buffer) and 32 Byte (Cache Lines) 

relevant this reduces the possibilities by a lot after all.<br>

<br>

In the particular example I've hit however it's pretty obvious that 

alignment is not the issue. (And I still verified that).<br>

In the end how big the impact of a better layout would be in general is 

hard to quantify. Hence the question if anyone has<br>

pointers to good literature which looks into this.<br>

<br>

Cheers<br>

Andreas<br>

<br>

<br>

</body></html>