GHC Data.List.sort performance question

Tue Jan 15 20:40:10 EST 2008

Hi all,

I was looking into this issue as well, by coincidence.

I haven't figured out what is wrong, but I believe Hugs (September  
2006) exhibits similar behaviour, so it is not just GHC.

I also noticed that in the third equation of merge_pairs, swapping  
the order of the arguments to merge also avoids the stack overflow:

Original third equation:

   merge_pairs cmp (xs:ys:xss)
       = merge cmp xs ys : merge_pairs cmp xss

Swapped version:

    merge_pairs cmp (xs:ys:xss)
        = merge cmp ys xs : merge_pairs cmp xss
                                ^^^^^

And for reference, here is how I was testing it:

    main = do
       args <- getArgs
       let size = (read $ head args) :: Int
       main' size

    main' size = do
       gen <- getStdGen
       let rs = (randoms gen) :: [Int]
       let list = take size rs
       print $ length list
       print $ sort list

I'm keen to find out what is going on here.

Cheers,
Bernie.

On 16/01/2008, at 2:43 AM, Simon Peyton-Jones wrote:

> Weird.  I see no difference in the compiled code (GHC HEAD).
>
> Simon
>
> | -----Original Message-----
> | From: glasgow-haskell-users-bounces at haskell.org [mailto:glasgow- 
> haskell-users-bounces at haskell.org] On Behalf Of
> | Marcus D. Gabriel
> | Sent: 14 January 2008 21:02
> | To: glasgow-haskell-users at haskell.org
> | Subject: GHC Data.List.sort performance question
> |
> | Hello,
> |
> | By a rather indirect route, I discovered that I obtain an almost
> | factor of two improvement in performance in Data.List.sort if I make
> | one small change in the implementation of the function merge which
> | supports mergesort and hence sortBy and sort.  Admittedly, the
> | improvement was only noticeable to me when sorting for example one
> | million random Int.  The current code in libraries/base/Data/List.hs
> | for merge is
> |
> | \begin{code}
> |
> | merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
> | merge cmp xs [] = xs
> | merge cmp [] ys = ys
> | merge cmp (x:xs) (y:ys)
> |  = case x `cmp` y of
> |         GT -> y : merge cmp (x:xs)   ys
> |         _  -> x : merge cmp    xs (y:ys)
> |
> | \end{code}
> |
> | and all that I did was
> |
> | \begin{code}
> |
> | merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
> | merge cmp [] ys = ys
> | merge cmp xs [] = xs
> | merge cmp (x:xs) (y:ys)
> |  = case x `cmp` y of
> |         GT -> y : merge cmp (x:xs)   ys
> |         _  -> x : merge cmp    xs (y:ys)
> |
> | \end{code}
> |
> | that is, I swapped the order of the first two lines.  By the way,  
> the
> | second version is much less likely to overflow the stack than the
> | first version.
> |
> | Can some confirm this?  If you confirm it, can someone explain to me
> | why one obtains this performance improvement?  I currently just  
> do not
> | grasp the point.
> |
> | Thanks,
> | - Marcus
> |
> | --
> |   Marcus D. Gabriel, Ph.D.                    Email:     
> mdg at wanadoo.fr
> |   213 ter, rue de Mulhouse                      Tel:  
> +33.3.89.69.05.06
> |   F68300 Saint Louis  FRANCE               Portable:  
> +33.6.34.56.07.75
> |
> |
> | _______________________________________________
> | Glasgow-haskell-users mailing list
> | Glasgow-haskell-users at haskell.org
> | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users