<div dir="ltr">Thank you! This identifies a space leak in base which went unnoticed for 7 years.<div><br></div><div>Your implementation can be improved further. Instead of splitting into pairs, you could instead split into lists of sorted sublists by replacing the pairs function with the following</div><div><br></div><div><div> pair = foldr f []</div><div> where</div><div> f x [] = [[x]]</div><div> f x (y:ys)</div><div> | x `cmp` head y == LT = (x:y):ys</div><div> | otherwise = [x]:y:ys</div></div><div><br></div><div>This should give you the same performance improvements for sorting random lists, but better performance while sorting ascending lists.</div><div><br></div><div>The version in base takes it one step further by using a DList to handle the descending case efficiently as well, except there's a space leak right now because of which it is slower.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Mar 26, 2017 at 7:21 AM, Gregory Popovitch <span dir="ltr"><<a href="mailto:greg7mdp@gmail.com" target="_blank">greg7mdp@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Motivation:<br>
----------<br>
<br>
Data.List.sort is a very important functionality in Haskell. I believe that<br>
the proposed implementation is:<br>
<br>
- significantly faster than the current implementation on unsorted lists,<br>
typically 14% to 27% faster<br>
- more laziness-friendly, i.e.:<br>
take 3 $ sort l<br>
will require significantly less comparisons than the current<br>
implementation<br>
<br>
Proposed Implementation<br>
-----------------------<br>
<br>
sort :: (Ord a) => [a] -> [a]<br>
sort = sortBy compare<br>
<br>
sortBy cmp [] = []<br>
sortBy cmp xs = head $ until (null.tail) reduce (pair xs)<br>
where<br>
pair (x:y:t) | x `cmp` y == GT = [y, x] : pair t<br>
| otherwise = [x, y] : pair t<br>
pair [x] = [[x]]<br>
pair [] = []<br>
<br>
reduce (v:w:x:y:t) = merge v' x' : reduce t<br>
where v' = merge v w<br>
x' = merge x y<br>
<br>
reduce (x:y:t) = merge x y : reduce t<br>
reduce xs = xs<br>
<br>
merge xs [] = xs<br>
merge [] ys = ys<br>
merge xs@(x:xs') ys@(y:ys')<br>
| x `cmp` y == GT = y : merge xs ys'<br>
| otherwise = x : merge xs' ys<br>
<br>
<br>
Effect and Interactions<br>
-----------------------<br>
<br>
I have a stack project with a criterion test for this new implementation,<br>
available at <a href="https://github.com/greg7mdp/ghc-sort" rel="noreferrer" target="_blank">https://github.com/greg7mdp/<wbr>ghc-sort</a>.<br>
I ran the tests on an Ubuntu 14.0.2 VM and GHC 8.0.2, and had the following<br>
results:<br>
<br>
- sorting of random lists of integers is 27% faster<br>
- sorting of random lists of strings is 14% faster<br>
- sorting of already sorted lists is significantly slower, but still much<br>
faster than sorting random lists<br>
- proposed version is more laziness friendly. For example this version of<br>
sortBy requires 11 comparisons to find<br>
the smallest element of a 15 element list, while the default<br>
Data.List.sortBy requires 15 comparisons.<br>
(see<br>
<a href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs" rel="noreferrer" target="_blank">https://github.com/greg7mdp/<wbr>ghc-sort/blob/master/src/sort_<wbr>with_trace.hs</a>)<br>
<br>
<br>
Test results<br>
------------<br>
<br>
Criterion output (descending/ascending results are for already sorted<br>
lists).<br>
I barely understand what Criterion does, and I am puzzled with the various<br>
"T" output - maybe there is a bug in my bench code:<br>
<br>
vagrant@vagrant-ubuntu-trusty-<wbr>64:/vagrant$ stack exec ghc-sort<br>
benchmarking ascending ints/ghc<br>
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT<wbr>TTTTTTTtime 160.6 ms (153.4<br>
ms .. 167.8 ms)<br>
0.997 R² (0.986 R² .. 1.000 R²)<br>
mean 161.7 ms (158.3 ms .. 165.9 ms)<br>
std dev 5.210 ms (3.193 ms .. 7.006 ms)<br>
variance introduced by outliers: 12% (moderately inflated)<br>
<br>
benchmarking ascending ints/greg<br>
TTTTTTTTTTTTTTTTtime 473.8 ms (398.6 ms .. 554.9 ms)<br>
0.996 R² (0.987 R² .. 1.000 R²)<br>
mean 466.2 ms (449.0 ms .. 475.0 ms)<br>
std dev 14.94 ms (0.0 s .. 15.29 ms)<br>
variance introduced by outliers: 19% (moderately inflated)<br>
<br>
benchmarking descending ints/ghc<br>
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT<wbr>TTTTTTTtime 165.1 ms (148.2<br>
ms .. 178.2 ms)<br>
0.991 R² (0.957 R² .. 1.000 R²)<br>
mean 158.7 ms (154.0 ms .. 164.3 ms)<br>
std dev 7.075 ms (4.152 ms .. 9.903 ms)<br>
variance introduced by outliers: 12% (moderately inflated)<br>
<br>
benchmarking descending ints/greg<br>
TTTTTTTTTTTTTTTTtime 471.7 ms (419.8 ms .. 508.3 ms)<br>
0.999 R² (0.995 R² .. 1.000 R²)<br>
mean 476.0 ms (467.5 ms .. 480.0 ms)<br>
std dev 7.447 ms (67.99 as .. 7.865 ms)<br>
variance introduced by outliers: 19% (moderately inflated)<br>
<br>
benchmarking random ints/ghc<br>
TTTTTTTTTTTTTTTTtime 2.852 s (2.564 s .. 3.019 s)<br>
0.999 R² (0.997 R² .. 1.000 R²)<br>
mean 2.812 s (2.785 s .. 2.838 s)<br>
std dev 44.06 ms (543.9 as .. 44.97 ms)<br>
variance introduced by outliers: 19% (moderately inflated)<br>
<br>
benchmarking random ints/greg<br>
TTTTTTTTTTTTTTTTtime 2.032 s (1.993 s .. 2.076 s)<br>
1.000 R² (1.000 R² .. 1.000 R²)<br>
mean 2.028 s (2.019 s .. 2.033 s)<br>
std dev 7.832 ms (0.0 s .. 8.178 ms)<br>
variance introduced by outliers: 19% (moderately inflated)<br>
<br>
benchmarking shakespeare/ghc<br>
TTTTTTTTTTTTTTTTtime 6.504 s (6.391 s .. 6.694 s)<br>
1.000 R² (1.000 R² .. 1.000 R²)<br>
mean 6.499 s (6.468 s .. 6.518 s)<br>
std dev 28.85 ms (0.0 s .. 32.62 ms)<br>
variance introduced by outliers: 19% (moderately inflated)<br>
<br>
benchmarking shakespeare/greg<br>
TTTTTTTTTTTTTTTTtime 5.560 s (5.307 s .. 5.763 s)<br>
1.000 R² (0.999 R² .. 1.000 R²)<br>
mean 5.582 s (5.537 s .. 5.607 s)<br>
std dev 39.30 ms (0.0 s .. 43.49 ms)<br>
variance introduced by outliers: 19% (moderately inflated)<br>
<br>
<br>
Costs and Drawbacks<br>
-------------------<br>
<br>
The only cost I see is the reduced performance when sorting already sorted<br>
lists. However, since this remains quite efficient, indeed over 4 times<br>
faster than sorting unsorted lists, I think it is an acceptable tradeoff.<br>
<br>
Final note<br>
----------<br>
<br>
My Haskell is very rusty. I worked on this a couple years ago when I was<br>
learning Haskell, and meant to propose it to the Haskell community, but<br>
never got to it at the time.<br>
<br>
______________________________<wbr>_________________<br>
Libraries mailing list<br>
<a href="mailto:Libraries@haskell.org">Libraries@haskell.org</a><br>
<a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries" rel="noreferrer" target="_blank">http://mail.haskell.org/cgi-<wbr>bin/mailman/listinfo/libraries</a><br>
</blockquote></div><br></div>