[Haskell-cafe] Problem with benchmarking FFI calls with Criterion

Janek S. fremenzone at poczta.onet.pl
Tue Nov 27 14:56:02 CET 2012


Dnia wtorek, 27 listopada 2012, Gregory Collins napisał:
> Did you pass the option to criterion asking it to do a GC between
> trials? 
Yes.


> You might be measuring a GC pause. 
>
> On Tue, Nov 27, 2012 at 2:41 PM, Janek S. <fremenzone at poczta.onet.pl> wrote:
> > Dnia wtorek, 27 listopada 2012, Jake McArthur napisał:
> >> I once had a problem like this. It turned out that my laptop was
> >> stepping the cpu clock rate down whenever it got warm. Disabling that
> >> feature in my BIOS fixed it. Your problem might be similar.
> >
> > I just check - I disabled frequency scaling and results are the same.
> > Actually I doubt that 39us of benchmarking would cause CPU overheating
> > with such repeatibility. Besides, this wouldn't explain why the first
> > benchmark actually got faster.
> >
> > Janek
> >
> >> On Nov 27, 2012 7:23 AM, "Janek S." <fremenzone at poczta.onet.pl> wrote:
> >> > I tested the same code on my second machine - Debian Squeeze (kernel
> >> > 2.6.32) with GHC 7.4.1 - and
> >> > the results are extremely surprising. At first I was unable to
> >> > reproduce the problem and got
> >> > consistent runtimes of about 107us:
> >> >
> >> > benchmarking FFI/C binding
> >> > mean: 107.3837 us, lb 107.2013 us, ub 107.5862 us, ci 0.950
> >> > std dev: 983.6046 ns, lb 822.6750 ns, ub 1.292724 us, ci 0.950
> >> >
> >> > benchmarking FFI/C binding
> >> > mean: 108.1152 us, lb 107.9457 us, ub 108.3052 us, ci 0.950
> >> > std dev: 916.2469 ns, lb 793.1004 ns, ub 1.122127 us, ci 0.950
> >> >
> >> > I started experimenting with the vector size and after bumping its
> >> > size to 32K elements I started
> >> > getting this:
> >> >
> >> > benchmarking FFI/C binding
> >> > mean: 38.50100 us, lb 36.71525 us, ub 46.87665 us, ci 0.950
> >> > std dev: 16.93131 us, lb 1.033678 us, ub 40.23900 us, ci 0.950
> >> > found 6 outliers among 100 samples (6.0%)
> >> >   3 (3.0%) low mild
> >> >   3 (3.0%) high severe
> >> > variance introduced by outliers: 98.921%
> >> > variance is severely inflated by outliers
> >> >
> >> > benchmarking FFI/C binding
> >> > mean: 209.9733 us, lb 209.5316 us, ub 210.4680 us, ci 0.950
> >> > std dev: 2.401398 us, lb 2.052981 us, ub 2.889688 us, ci 0.950
> >> >
> >> > First result is always about 39us (2,5 faster, despite longer signal!)
> >> > while the remaining
> >> > benchmarks take almost two times longer.
> >> >
> >> > Janek
> >> >
> >> > Dnia niedziela, 25 listopada 2012, Janek S. napisał:
> >> > > Well, it seems that this only happens on my machine. I will try to
> >> > > test this code on different computer and see if I can reproduce it.
> >> > >
> >> > > I don't think using existing vector is a good idea - it would make
> >> > > the
> >> >
> >> > code
> >> >
> >> > > impure.
> >> > >
> >> > > Janek
> >> > >
> >> > > Dnia sobota, 24 listopada 2012, Branimir Maksimovic napisał:
> >> > > > I don't see such behavior neither.ubuntu 12.10, ghc 7.4.2.
> >> > > > Perhaps this has to do with how malloc allocates /cachebehavior.
> >> > > > If you try not to allocate array rather use existing one perhaps
> >> > > > there would
> >> >
> >> > be
> >> >
> >> > > > no inconsistency?It looks to me that's about CPU cache
> >> > > > performance. Branimir
> >> > > >
> >> > > > > I'm using GHC 7.4.2 on x86_64 openSUSE Linux, kernel 2.6.37.6.
> >> > > > >
> >> > > > > Janek
> >> > > > >
> >> > > > > Dnia piątek, 23 listopada 2012, Edward Z. Yang napisał:
> >> > > > > > Running the sample code on GHC 7.4.2, I don't see the "one
> >> > > > > > fast, rest slow" behavior.  What version of GHC are you
> >> > > > > > running?
> >> > > > > >
> >> > > > > > Edward
> >> > > > > >
> >> > > > > > Excerpts from Janek S.'s message of Fri Nov 23 13:42:03 -0500 2012:
> >> > > > > > > > What happens if you do the benchmark without
> >> > > > > > > > unsafePerformIO involved?
> >> > > > > > >
> >> > > > > > > I removed unsafePerformIO, changed copy to have type Vector
> >> >
> >> > Double
> >> >
> >> > > > > > > -> IO (Vector Double) and modified benchmarks like this:
> >> > > > > > >
> >> > > > > > > bench "C binding" $ whnfIO (copy signal)
> >> > > > > > >
> >> > > > > > > I see no difference - one benchmark runs fast, remaining
> >> > > > > > > ones run slow.
> >> > > > > > >
> >> > > > > > > Janek
> >> > > > > > >
> >> > > > > > > > Excerpts from Janek S.'s message of Fri Nov 23 10:44:15
> >> > > > > > > > -0500
> >> >
> >> > 2012:
> >> > > > > > > > > I am using Criterion library to benchmark C code called
> >> > > > > > > > > via
> >> >
> >> > FFI
> >> >
> >> > > > > > > > > bindings and I've ran into a problem that looks like a
> >> > > > > > > > > bug.
> >> > > > > > > > >
> >> > > > > > > > > The first benchmark that uses FFI runs correctly, but
> >> > > > > > > > > subsequent benchmarks run much longer. I created demo
> >> > > > > > > > > code (about 50 lines, available at github:
> >> > > > > > > > > https://gist.github.com/4135698 ) in which C function
> >> >
> >> > copies a
> >> >
> >> > > > > > > > > vector of doubles. I benchmark that function a couple of
> >> >
> >> > times.
> >> >
> >> > > > > > > > > First run results in avarage time of about 17us,
> >> > > > > > > > > subsequent runs take about 45us. In my real code
> >> > > > > > > > > additional time was
> >> >
> >> > about
> >> >
> >> > > > > > > > > 15us and it seemed to be a constant factor, not relative
> >> > > > > > > > > to "correct" run time. The surprising thing is that if
> >> > > > > > > > > my C function only allocates memory and does no copying:
> >> > > > > > > > >
> >> > > > > > > > > double* c_copy( double* inArr, int arrLen ) {
> >> > > > > > > > >   double* outArr = malloc( arrLen * sizeof( double ) );
> >> > > > > > > > >
> >> > > > > > > > >   return outArr;
> >> > > > > > > > > }
> >> > > > > > > > >
> >> > > > > > > > > then all is well - all runs take similar amount of time.
> >> > > > > > > > > I
> >> >
> >> > also
> >> >
> >> > > > > > > > > noticed that sometimes in my demo code all runs take
> >> > > > > > > > > about 45us, but this does not seem to happen in my real
> >> > > > > > > > > code -
> >> >
> >> > first
> >> >
> >> > > > > > > > > run is always shorter.
> >> > > > > > > > >
> >> > > > > > > > > Does anyone have an idea what is going on?
> >> > > > > > > > >
> >> > > > > > > > > Janek
> >> > > > >
> >> > > > > _______________________________________________
> >> > > > > Haskell-Cafe mailing list
> >> > > > > Haskell-Cafe at haskell.org
> >> > > > > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >> > >
> >> > > _______________________________________________
> >> > > Haskell-Cafe mailing list
> >> > > Haskell-Cafe at haskell.org
> >> > > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >> >
> >> > _______________________________________________
> >> > Haskell-Cafe mailing list
> >> > Haskell-Cafe at haskell.org
> >> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskell-Cafe at haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe





More information about the Haskell-Cafe mailing list