GHC Data.List.sort performance question
Bernie Pope
bjpop at csse.unimelb.edu.au
Tue Jan 15 20:40:10 EST 2008
Hi all,
I was looking into this issue as well, by coincidence.
I haven't figured out what is wrong, but I believe Hugs (September
2006) exhibits similar behaviour, so it is not just GHC.
I also noticed that in the third equation of merge_pairs, swapping
the order of the arguments to merge also avoids the stack overflow:
Original third equation:
merge_pairs cmp (xs:ys:xss)
= merge cmp xs ys : merge_pairs cmp xss
Swapped version:
merge_pairs cmp (xs:ys:xss)
= merge cmp ys xs : merge_pairs cmp xss
^^^^^
And for reference, here is how I was testing it:
main = do
args <- getArgs
let size = (read $ head args) :: Int
main' size
main' size = do
gen <- getStdGen
let rs = (randoms gen) :: [Int]
let list = take size rs
print $ length list
print $ sort list
I'm keen to find out what is going on here.
Cheers,
Bernie.
On 16/01/2008, at 2:43 AM, Simon Peyton-Jones wrote:
> Weird. I see no difference in the compiled code (GHC HEAD).
>
> Simon
>
> | -----Original Message-----
> | From: glasgow-haskell-users-bounces at haskell.org [mailto:glasgow-
> haskell-users-bounces at haskell.org] On Behalf Of
> | Marcus D. Gabriel
> | Sent: 14 January 2008 21:02
> | To: glasgow-haskell-users at haskell.org
> | Subject: GHC Data.List.sort performance question
> |
> | Hello,
> |
> | By a rather indirect route, I discovered that I obtain an almost
> | factor of two improvement in performance in Data.List.sort if I make
> | one small change in the implementation of the function merge which
> | supports mergesort and hence sortBy and sort. Admittedly, the
> | improvement was only noticeable to me when sorting for example one
> | million random Int. The current code in libraries/base/Data/List.hs
> | for merge is
> |
> | \begin{code}
> |
> | merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
> | merge cmp xs [] = xs
> | merge cmp [] ys = ys
> | merge cmp (x:xs) (y:ys)
> | = case x `cmp` y of
> | GT -> y : merge cmp (x:xs) ys
> | _ -> x : merge cmp xs (y:ys)
> |
> | \end{code}
> |
> | and all that I did was
> |
> | \begin{code}
> |
> | merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
> | merge cmp [] ys = ys
> | merge cmp xs [] = xs
> | merge cmp (x:xs) (y:ys)
> | = case x `cmp` y of
> | GT -> y : merge cmp (x:xs) ys
> | _ -> x : merge cmp xs (y:ys)
> |
> | \end{code}
> |
> | that is, I swapped the order of the first two lines. By the way,
> the
> | second version is much less likely to overflow the stack than the
> | first version.
> |
> | Can some confirm this? If you confirm it, can someone explain to me
> | why one obtains this performance improvement? I currently just
> do not
> | grasp the point.
> |
> | Thanks,
> | - Marcus
> |
> | --
> | Marcus D. Gabriel, Ph.D. Email:
> mdg at wanadoo.fr
> | 213 ter, rue de Mulhouse Tel:
> +33.3.89.69.05.06
> | F68300 Saint Louis FRANCE Portable:
> +33.6.34.56.07.75
> |
> |
> | _______________________________________________
> | Glasgow-haskell-users mailing list
> | Glasgow-haskell-users at haskell.org
> | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
More information about the Glasgow-haskell-users
mailing list