[Haskell-cafe] How to correctly benchmark code with Criterion?

Thu Oct 18 22:43:08 CEST 2012

On 18 October 2012 13:15, Janek S. <fremenzone at poczta.onet.pl> wrote:
>> Something like this might work, not sure what the canonical way is.
>> (...)
>
> This is basically the same as the answer I was given on SO. My concerns about this solutions are:
> - rnf requires its parameter to belong to NFData type class. This is not the case for some data
> structures like Repa arrays.

For unboxed arrays of primitive types WHNF = NF.  That is, once the
array is constructed all its elements will be in WHNF.

> - evaluate only evaluates its argument to WHNF - is this enough? If I have a tuple containing two
> lists won't this only evaluate the tuple construtor and leave the lists as thunks? This is
> actually the case in my code.

That is why you use "rnf" from the NFData type class. You use
"evaluate" to kick-start rnf which then goes ahead and evaluates
everything (assuming the NFData instance has been defined correctly.)

>
> As I said previously, it seems that Criterion somehow evaluates the data so that time needed for
> its creation is not included in the benchmark. I modified my dataBuild function to look lik this:
>
> dataBuild gen = unsafePerformIO $ do
>     let x = (take 6 $ randoms gen, take 2048 $ randoms gen)
>     delayThread 1000000
>     return x
>
> When I ran the benchmark, criterion estimated the time needed to complete it to over 100 seconds
> (which means that delayThread worked and was used as a basis for estimation), but the benchamrk
> was finished much faster and there was no difference in the final result comparing to the normal
> dataBuild function. This suggests that once data was created and used for estimation, the
> dataBuild function was not used again. The main question is: is this observation correct? In this
> question on SO:
> http://stackoverflow.com/questions/6637968/how-to-use-criterion-to-measure-performance-of-haskell-programs
> one of the aswers says that there is no automatic memoization, while it looks that in fact the
> values of dataBuild are memoized. I have a feeling that I am misunderstanding something.

If you bind an expression to a variable and then reuse that variable,
the expression is only evaluated once. That is, in "let x = expr in
..." the expression is only evaluated once. However, if you have "f y
= let x = expr in ..." then the expression is evaluated once per
function call.

>
>> I don't know if you have already read them,
>> but Tibell's slides on High Performance Haskell are pretty good:
>>
>> http://www.slideshare.net/tibbe/highperformance-haskell
>>
>> There is a section at the end where he runs several tests using Criterion.
> I skimmed the slides and slide 59 seems to show that my concerns regarding WHNF might be true.

It's usually safe if you benchmark a function. However, you most
likely want the result to be in normal form.  The "nf" does this for
you. So, if your benchmark function has type "f :: X -> ([Double],
Double)", your benchmark will be:

  bench "f" (nf f input)

The first run will evaluate the input (and discard the runtime) and
all subsequent runs will evaluate the result to normal form. For repa
you can use deepSeqArray [1] if your array is not unboxed:

  bench "f'" (whnf (deepSeqArray . f) input)

[1]: http://hackage.haskell.org/packages/archive/repa/3.2.2.2/doc/html/Data-Array-Repa.html#v:deepSeqArray