GHC Data.List.sort performance question

Mon Jan 14 16:01:49 EST 2008

Hello,

By a rather indirect route, I discovered that I obtain an almost
factor of two improvement in performance in Data.List.sort if I make
one small change in the implementation of the function merge which
supports mergesort and hence sortBy and sort.  Admittedly, the
improvement was only noticeable to me when sorting for example one
million random Int.  The current code in libraries/base/Data/List.hs
for merge is

\begin{code}

merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
merge cmp xs [] = xs
merge cmp [] ys = ys
merge cmp (x:xs) (y:ys)
 = case x `cmp` y of
        GT -> y : merge cmp (x:xs)   ys
        _  -> x : merge cmp    xs (y:ys)

\end{code}

and all that I did was

\begin{code}

merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
merge cmp [] ys = ys
merge cmp xs [] = xs
merge cmp (x:xs) (y:ys)
 = case x `cmp` y of
        GT -> y : merge cmp (x:xs)   ys
        _  -> x : merge cmp    xs (y:ys)

\end{code}

that is, I swapped the order of the first two lines.  By the way, the
second version is much less likely to overflow the stack than the
first version.

Can some confirm this?  If you confirm it, can someone explain to me
why one obtains this performance improvement?  I currently just do not
grasp the point.

Thanks,
- Marcus

-- 
  Marcus D. Gabriel, Ph.D.                    Email:    mdg at wanadoo.fr
  213 ter, rue de Mulhouse                      Tel: +33.3.89.69.05.06
  F68300 Saint Louis  FRANCE               Portable: +33.6.34.56.07.75