[Git][ghc/ghc][wip/three-way-merge-sort] Add three way merge to Data.List.sort(By)

Jade (@Jade) gitlab at gitlab.haskell.org
Tue Jan 9 20:38:56 UTC 2024



Jade pushed to branch wip/three-way-merge-sort at Glasgow Haskell Compiler / GHC


Commits:
3532659b by Jade at 2024-01-09T21:41:56+01:00
Add three way merge to Data.List.sort(By)

This patch aims to implement an improved version of Data.List.sortBy and therefore Data.List.sort which increases performance by up to 20% on larger lists, but also ~10% on lists of size 100 and smaller.
This approach implements a three-way merge which also aims to reuse comparisons.

Fixes #24280

-------------------------
Metric Decrease:
    MultiLayerModulesTH_Make
    T10421
    T13719
    T15164
    T18698a
    T18698b
    T1969
    T9872a
    T9961
    WWRec
-------------------------

- - - - -


1 changed file:

- libraries/base/src/Data/OldList.hs


Changes:

=====================================
libraries/base/src/Data/OldList.hs
=====================================
@@ -1636,37 +1636,75 @@ and possibly to bear similarities to a 1982 paper by Richard O'Keefe:
 
 Benchmarks show it to be often 2x the speed of the previous implementation.
 Fixes ticket https://gitlab.haskell.org/ghc/ghc/issues/2143
+
+Further improved using a three-way merge, with an additional performance increase of ~20%
+https://gitlab.haskell.org/ghc/ghc/issues/24280
 -}
 
 sort = sortBy compare
-sortBy cmp = mergeAll . sequences
+
+sortBy _ []  = []
+sortBy _ [x] = [x]
+sortBy cmp ns
+  | [x, y]    <- ns = merge [x] [y]
+  | [x, y, z] <- ns = merge' [x] [y] [z]
+  | otherwise       = mergeAll (sequences ns)
   where
+    x `gt` y = x `cmp` y == GT
+
     sequences (a:b:xs)
-      | a `cmp` b == GT = descending b [a]  xs
-      | otherwise       = ascending  b (a:) xs
+      | a `gt` b  = descending b [a]  xs
+      | otherwise = ascending  b (a:) xs
     sequences xs = [xs]
 
     descending a as (b:bs)
-      | a `cmp` b == GT = descending b (a:as) bs
-    descending a as bs  = (a:as): sequences bs
+      | a `gt` b       = descending b (a:as) bs
+    descending a as bs = (a:as): sequences bs
 
     ascending a as (b:bs)
-      | a `cmp` b /= GT = ascending b (\ys -> as (a:ys)) bs
-    ascending a as bs   = let !x = as [a]
-                          in x : sequences bs
+      | not (a `gt` b) = ascending b (\ys -> as (a:ys)) bs
+    ascending a as bs  = let !x = as [a]
+                         in x : sequences bs
 
     mergeAll [x] = x
     mergeAll xs  = mergeAll (mergePairs xs)
 
-    mergePairs (a:b:xs) = let !x = merge a b
-                          in x : mergePairs xs
-    mergePairs xs       = xs
+    mergePairs [a, b, c]  = [merge' a b c]
+    mergePairs (a:b:c:xs) = let !x = merge' a b c
+                            in x : mergePairs xs
+    mergePairs [a,b]      = [merge a b]
+    mergePairs xs         = xs
+
 
     merge as@(a:as') bs@(b:bs')
-      | a `cmp` b == GT = b:merge as  bs'
-      | otherwise       = a:merge as' bs
-    merge [] bs         = bs
-    merge as []         = as
+      | a `gt` b   = b : merge as  bs'
+      | otherwise  = a : merge as' bs
+    merge [] bs   = bs
+    merge as []   = as
+
+    merge' as@(a:as') bs@(b:bs') cs@(c:cs')
+      | a_gt_b, b_gt_c = c : merge'gt as (b:|bs') cs' -- a > b > c
+      | a_gt_b         = b : merge'   as bs' cs       -- a > b <= c
+      | a_gt_c         = c : merge'le (a:|as') bs cs' -- c < a <= b
+      | otherwise      = a : merge'   as' bs cs       -- c >= a <= b
+      where a_gt_b = a `gt` b
+            a_gt_c = a `gt` c
+            b_gt_c = b `gt` c
+    merge' [] bs cs = merge bs cs
+    merge' as [] cs = merge as cs
+    merge' as bs [] = merge as bs
+
+    merge'gt as bs@(b:|bs') cs@(c:cs')
+      | b_gt_c    = c : merge'gt as bs cs'  -- a > b > c
+      | otherwise = b : merge'   as bs' cs  -- a > b <= c
+      where b_gt_c = b `gt` c
+    merge'gt as (b:|bs) [] = b : merge as bs
+
+    merge'le as@(a:|as') bs cs@(c:cs')
+      | a_gt_c    = c : merge'le as bs cs'  -- c < a <= b
+      | otherwise = a : merge'   as' bs cs  -- c >= a <= b
+      where a_gt_c = a `gt` c
+    merge'le (a:|as) bs [] = a : merge as bs
 
 {-
 sortBy cmp l = mergesort cmp l



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/3532659b00d66736056a75dbffb50b4a76a7c3aa

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/3532659b00d66736056a75dbffb50b4a76a7c3aa
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240109/885fb950/attachment-0001.html>


More information about the ghc-commits mailing list