List.sort

Sun, 12 May 2002 21:22:02 +0100

> I am curious as to why the List.sort implementation in GHC is a
> quicksort algorithm rather than an algorithm that guarantees n log n
> time in the worst case? 

I'm not going to defend quicksort here, but your question reminded
me of another one, asked a while ago on comp.lang.functional (actually,
the topic seems to come up regularly, as a search for quicksort in 
c.l.f shows;-). This particular poster suggested that choosing a random 
pivot instead of the first element might make the infamous showcase 
quicksort a lot less elegant. As your measurements target precisely this 
weakness of typical functional quicksort implementations (see the c.l.f 
discussions for other problems), I thought I'd have a go at the challenge.

It turns out that one doesn't even need the source code for sort to 
make the change!-) Could you perhaps include one of the following
randomised-input sort variants in your measurements? 

The idea is to take a list of randoms, the list of inputs, and presort
the input list by the random indices, so that selecting the first element
as pivot in the following main sort actually takes a random element
(the presort uses randomised input by construction). 

That's two well-behaved quicksorts instead of a single, often misbehaving
one (well-behaved only on average, of course). Should do a lot better,
on average, though it will still lose against other sorts, not to mention
sorts tailored to what you know about your input list.

Just for fun;-)
Claus

> module RSort (rsortIO,randomiseIO,rsort,randomise) where

> import List
> import Random

> getRandoms n = 
>   newStdGen >>= return . randomRs (1,n)

> randomiseIO l = do
>   rs <- getRandoms $ length l 
>   return $ map snd $ sortBy compareIdx $ zip rs l
>   where
>     compareIdx (i,_) (j,_) = i `compare` j

> rsortIO l = randomiseIO l >>= return . sort

> randomise l = do
>   map snd $ sortBy compareIdx $ zip rs l
>   where
>     n  = length l
>     rs = take n $ randomRs (1,n) $ mkStdGen 100
>     compareIdx (i,_) (j,_) = i `compare` j

> rsort l = sort $ randomise l