GHC Data.List.sort performance question

Mon Jan 14 18:34:11 EST 2008

mdg:
> Hello,
> 
> By a rather indirect route, I discovered that I obtain an almost
> factor of two improvement in performance in Data.List.sort if I make
> one small change in the implementation of the function merge which
> supports mergesort and hence sortBy and sort.  Admittedly, the
> improvement was only noticeable to me when sorting for example one
> million random Int.  The current code in libraries/base/Data/List.hs
> for merge is
> 
> \begin{code}
> 
> merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
> merge cmp xs [] = xs
> merge cmp [] ys = ys
> merge cmp (x:xs) (y:ys)
> = case x `cmp` y of
>        GT -> y : merge cmp (x:xs)   ys
>        _  -> x : merge cmp    xs (y:ys)
> 
> \end{code}
> 
> and all that I did was
> 
> \begin{code}
> 
> merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
> merge cmp [] ys = ys
> merge cmp xs [] = xs
> merge cmp (x:xs) (y:ys)
> = case x `cmp` y of
>        GT -> y : merge cmp (x:xs)   ys
>        _  -> x : merge cmp    xs (y:ys)
> 
> \end{code}
> 
> that is, I swapped the order of the first two lines.  By the way, the
> second version is much less likely to overflow the stack than the
> first version.
> 
> Can some confirm this?  If you confirm it, can someone explain to me
> why one obtains this performance improvement?  I currently just do not
> grasp the point.
> 
> Thanks,
> - Marcus

Ian, you looked at sort most recently. What to check the generated code
(or rerun your benchmarks?)

-- Don