GHC Data.List.sort performance question
Marcus D. Gabriel
mdg at wanadoo.fr
Mon Jan 14 16:01:49 EST 2008
Hello,
By a rather indirect route, I discovered that I obtain an almost
factor of two improvement in performance in Data.List.sort if I make
one small change in the implementation of the function merge which
supports mergesort and hence sortBy and sort. Admittedly, the
improvement was only noticeable to me when sorting for example one
million random Int. The current code in libraries/base/Data/List.hs
for merge is
\begin{code}
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
merge cmp xs [] = xs
merge cmp [] ys = ys
merge cmp (x:xs) (y:ys)
= case x `cmp` y of
GT -> y : merge cmp (x:xs) ys
_ -> x : merge cmp xs (y:ys)
\end{code}
and all that I did was
\begin{code}
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
merge cmp [] ys = ys
merge cmp xs [] = xs
merge cmp (x:xs) (y:ys)
= case x `cmp` y of
GT -> y : merge cmp (x:xs) ys
_ -> x : merge cmp xs (y:ys)
\end{code}
that is, I swapped the order of the first two lines. By the way, the
second version is much less likely to overflow the stack than the
first version.
Can some confirm this? If you confirm it, can someone explain to me
why one obtains this performance improvement? I currently just do not
grasp the point.
Thanks,
- Marcus
--
Marcus D. Gabriel, Ph.D. Email: mdg at wanadoo.fr
213 ter, rue de Mulhouse Tel: +33.3.89.69.05.06
F68300 Saint Louis FRANCE Portable: +33.6.34.56.07.75
More information about the Glasgow-haskell-users
mailing list