[Haskell-cafe] Re: (flawed?) benchmark : sort

Tue Mar 11 00:14:32 EDT 2008

On 11 Mar 2008, at 12:27 pm, Krzysztof Skrzętnicki wrote:

> I've written little framework to work on. See sortbench.hs and  
> sortbench.py attachments.
> Furthermore, I checked Yhc's implementation of sort: it is indeed  
> very fast:

I took his earlier code and plugged yhc's sort into it.
Compiling with ghc -O2 using GHC 6.8.2, I found the yhc code (basically
variant of natural merge) to be considerably slower than some of the
alternatives.

There is a pretty obvious way to speed up the YHC code which you would
expect to provide nearly a factor of two speedup, and with the random
integer data, it does.

However, there is one simple but extremely important point which must be
considered in evaluating a sorting routine:  the library 'sort' function
is, or should be, a *general-purpose* sort.  It should be useful with  
any
data type which is an instance of Ord or for which you can write a `cmp`
function, and it should work at least as well with already-sorted input
as with randomised input.  quicksort (whose original reason for  
existence
was to sort on a machine whose memory would disgrace today's  
wristwatches)
is well known for doing deceptively well on randomised integer  
sequences.

When I run Krzystztof's test harness (which I have currently brought  
up to
25 different sorting functions) with a list of the form [1..N] instead  
of
a random list, suddenly all the variants of merge sort come out ahead of
all the variants of quick sort.  In fact his best version of quicksort,
qsort_iv, comes out fully 1155 times slower than the YHC algorithm on a
list of 10,000 ordered integers.  That can be improved by spending a bit
of effort on choosing a good pivot, but then the quicksort algorithms  
are
no longer so competitive for randomised inputs.

The classic "Engineering a Quicksort" paper by Bentley and McIlroy from
Software : Practice & Experience recommends a whole bunch of  
distribution
shapes (one run, two runs, sawtooth, organ pipes, random, ...) that  
should
be benchmarked before drawing too many conclusions.

It is also wise to try more than one data type.  How do the different
algorithms compare on random samples from a Scrabble dictionary?  (Why
that particular question?  Because I mean to try it.)

Right now, I remain happy with merge sort, because it is never  
mysteriously
several thousand times slower than expected.