[Haskell-cafe] Re: (flawed?) benchmark : sort

Tue Mar 11 04:11:41 EDT 2008

Are you really sure that your results are correct? I obviously did all my
tests with -O2 on.
Please rerun your tests on the new "framework". Also note that this one
contains tests for three cases:
- sorted
- revsorted
- randomized
From previous results I can guess that with randomized input Yhc's code will
be ~3 times slower then the fastest quicksort out there.
But I'm not going to run O(n^2) algorithm to compare with O(n log n) - and
this is the case for (rev?)sorted inputs.

Christopher Skrzętnicki

On Tue, Mar 11, 2008 at 5:14 AM, Richard A. O'Keefe <ok at cs.otago.ac.nz>
wrote:

>
> On 11 Mar 2008, at 12:27 pm, Krzysztof Skrzętnicki wrote:
>
> > I've written little framework to work on. See sortbench.hs and
> > sortbench.py attachments.
> > Furthermore, I checked Yhc's implementation of sort: it is indeed
> > very fast:
>
> I took his earlier code and plugged yhc's sort into it.
> Compiling with ghc -O2 using GHC 6.8.2, I found the yhc code (basically
> variant of natural merge) to be considerably slower than some of the
> alternatives.
> (...)
> When I run Krzystztof's test harness (which I have currently brought
> up to
> 25 different sorting functions) with a list of the form [1..N] instead
> of
> a random list, suddenly all the variants of merge sort come out ahead of
> all the variants of quick sort.  In fact his best version of quicksort,
> qsort_iv, comes out fully 1155 times slower than the YHC algorithm on a
> list of 10,000 ordered integers.  That can be improved by spending a bit
> of effort on choosing a good pivot, but then the quicksort algorithms
> are
> no longer so competitive for randomised inputs.

This paper looks interesting, I'm going to implement those tests.

>
> The classic "Engineering a Quicksort" paper by Bentley and McIlroy from
> Software : Practice & Experience recommends a whole bunch of
> distribution
> shapes (one run, two runs, sawtooth, organ pipes, random, ...) that
> should
> be benchmarked before drawing too many conclusions.

This is the right point. A further work will be to add different input
generators to run them too.

> It is also wise to try more than one data type.  How do the different
> algorithms compare on random samples from a Scrabble dictionary?  (Why
> that particular question?  Because I mean to try it.)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20080311/a1018bcf/attachment.htm