Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.
Gregory Popovitch
greg7mdp at gmail.com
Mon Mar 27 21:05:47 UTC 2017
Pretty cool by the way, we now have a 31% improvement for sorting lists of
random integers vs the current Data.List.sortBy.
greg
_____
From: siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] On Behalf Of
Siddhanathan Shanmugam
Sent: Monday, March 27, 2017 12:53 PM
To: Gregory Popovitch
Cc: Haskell Libraries
Subject: Re: Proposal: a new implementation for Data.List.sort and
Data.List.sortBy, which has better performance characteristics and is more
laziness-friendly.
We can improve things a bit further by forcing evaluation (with seq) along
the way appropriately.
gregSortBy cmp [] = []
gregSortBy cmp xs = head $ until (null.tail) reduce (pair xs)
where
pair (x:y:t) | x `cmp` y == GT = [y, x] : pair t
| otherwise = [x, y] : pair t
pair [x] = [[x]]
pair [] = []
reduce (v:w:x:y:t) = merge v' x' `seq` merge v' x' : reduce t
where v' = merge v w `seq` merge v w
x' = merge x y `seq` merge x y
reduce (x:y:t) = merge x y `seq` merge x y : reduce t
reduce xs = xs
merge xs [] = xs
merge [] ys = ys
merge xs@(x:xs') ys@(y:ys')
| x `cmp` y == GT = y : merge xs ys'
| otherwise = x : merge xs' ys
gSortBy cmp = mergeAll . sequences
where
sequences (a:b:xs)
| a `cmp` b == GT = descending b [a] xs
| otherwise = ascending b (a:) xs
sequences xs = [xs]
descending a as (b:bs)
| a `cmp` b == GT = descending b (a:as) bs
descending a as bs = (a:as) `seq` (a:as) : sequences bs
ascending a as (b:bs)
| a `cmp` b /= GT = ascending b (as . (a:)) bs
ascending a as bs = as [a] `seq` as [a] : sequences bs
mergeAll [x] = x
mergeAll xs = mergeAll (mergePairs xs)
mergePairs (a:b:xs) = merge a b `seq` merge a b : mergePairs xs
mergePairs xs = xs
merge as@(a:as') bs@(b:bs')
| a `cmp` b == GT = b : merge as bs'
| otherwise = a : merge as' bs
merge [] bs = bs
merge as [] = as
Before the change:
benchmarking random ints/ghc
time 3.687 s (3.541 s .. NaN s)
1.000 R² (1.000 R² .. 1.000 R²)
mean 3.691 s (3.669 s .. 3.705 s)
std dev 21.45 ms (0.0 s .. 24.76 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/greg
time 2.648 s (2.482 s .. 2.822 s)
0.999 R² (0.998 R² .. 1.000 R²)
mean 2.704 s (2.670 s .. 2.736 s)
std dev 52.68 ms (0.0 s .. 54.49 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/gSort
time 2.733 s (2.682 s .. 2.758 s)
1.000 R² (1.000 R² .. 1.000 R²)
mean 2.707 s (2.689 s .. 2.718 s)
std dev 16.84 ms (0.0 s .. 19.20 ms)
variance introduced by outliers: 19% (moderately inflated)
After the change:
benchmarking random ints/greg
time 2.576 s (2.548 s .. 2.628 s)
1.000 R² (1.000 R² .. 1.000 R²)
mean 2.590 s (2.578 s .. 2.599 s)
std dev 12.99 ms (0.0 s .. 14.89 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/gSort
time 2.538 s (2.412 s .. 2.627 s)
1.000 R² (0.999 R² .. 1.000 R²)
mean 2.543 s (2.517 s .. 2.560 s)
std dev 26.16 ms (0.0 s .. 30.21 ms)
variance introduced by outliers: 19% (moderately inflated)
On Sun, Mar 26, 2017 at 1:54 PM, Siddhanathan Shanmugam
<siddhanathan+eml at gmail.com> wrote:
Theoretically, we could do better. We currently only exploit monotonic runs
in merge sort, but we could also exploit bitonic runs:
dlist as = as [] `seq` as []
sequences [] = [[]]
sequences [a] = [[a]]
sequences (a:xs) = bitonic a a (a:) xs
bitonic min max as (b:bs)
| b `cmp` max /= LT = bitonic min b (as . (b:)) bs
| b `cmp` min /= GT = bitonic b max ((b:) . as) bs
| otherwise = dlist as : sequences (b:bs)
bitonic _ _ as [] = [dlist as]
The constant factors here might be too high to notice the difference though.
> However, still my version is more laziness-friendly, i.e. it requires
fewer
> comparisons to get the
> N smallest elements of a list (see
> https://github.com/greg7mdp/
<https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs>
ghc-sort/blob/master/src/sort_with_trace.hs).
>
> I wonder if this might not be a more useful trait than being able to sort
> already sorted lists super fast.
This comes down to a discussion of merge sort vs natural merge sort.
Data.List.sort is an implementation of a variant of merge sort called
natural merge sort. The algorithm is linearithmic in the worst case, but
linear in the best case (already sorted list).
On Sun, Mar 26, 2017 at 10:47 AM, Gregory Popovitch <greg7mdp at gmail.com>
wrote:
Thanks again @Siddhanathan! Looks like your gSort fixes the main issue with
Data.List.sort().
I have updated the test programs in https://github.com/greg7mdp/gh
<https://github.com/greg7mdp/ghc-sort> c-sort to
include your new version.
Here are the results (your new version looks like a definite improvement vs
the current GHC one):
input GHC sort My Orig proposal gSort
----------------------------------------------------------------------------
---
sorted ints (ascending) 151 456 148
sorted ints (descending) 152 466 155
random ints 2732 2006 2004
random strings 6564 5549 5528
So replacing the current GHC version with gSort is a no brainer, as it is
better in all regards.
However, still my version is more laziness-friendly, i.e. it requires fewer
comparisons to get the
N smallest elements of a list (see
https://github.com/greg7mdp/gh
<https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs>
c-sort/blob/master/src/sort_with_trace.hs).
I wonder if this might not be a more useful trait than being able to sort
already sorted lists super fast.
Thanks,
greg
________________________________
From: siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] On Behalf Of
Siddhanathan Shanmugam
Sent: Sunday, March 26, 2017 1:05 PM
To: Gregory Popovitch
Cc: Haskell Libraries
Subject: Re: Proposal: a new implementation for Data.List.sort and
Data.List.sortBy, which has better performance characteristics and is more
laziness-friendly.
Interesting. You are right, performance for sorting random lists has
priority over performance for sorting already-sorted lists.
Ignore the numbers for my previous version. Can you compare GHC's sort, your
proposal, and gSort below?
gSort :: Ord a => [a] -> [a]
gSort = gSortBy compare
gSortBy cmp = mergeAll . sequences
where
sequences (a:b:xs)
| a `cmp` b == GT = descending b [a] xs
| otherwise = ascending b (a:) xs
sequences xs = [xs]
descending a as (b:bs)
| a `cmp` b == GT = descending b (a:as) bs
descending a as bs = (a:as) : sequences bs
ascending a as (b:bs)
| a `cmp` b /= GT = ascending b (\ys -> as (a:ys)) bs
ascending a as bs = as [a] `seq` as [a] : sequences bs
mergeAll [x] = x
mergeAll xs = mergeAll (mergePairs xs)
mergePairs (a:b:xs) = merge a b : mergePairs xs
mergePairs xs = xs
merge as@(a:as') bs@(b:bs')
| a `cmp` b == GT = b : merge as bs'
| otherwise = a : merge as' bs
merge [] bs = bs
merge as [] = as
Thanks,
Sid
On Sun, Mar 26, 2017 at 9:19 AM, Gregory Popovitch <greg7mdp at gmail.com>
wrote:
Thank you @Siddhanathan! I welcome any improvement you may make, as
I said I
am very far from a Haskell expert.
I just tested your change with my test project
(https://github.com/greg7mdp/g
<https://github.com/greg7mdp/ghc-sort> hc-sort
<https://github.com/greg7mdp/g <https://github.com/greg7mdp/ghc-sort>
hc-sort> )
and here are my results (mean times in ms):
input GHC sort Orig proposal
your
change
----------------------------------------------------------------------------
---
sorted ints (ascending) 153 467
139
sorted ints (descending) 152 472
599
random ints 2824 2077
2126
random strings 6564 5613
5983
Your change is a definite improvement for sorted integers in
ascending
order, but is worse for other cases.
Is there a real need to optimize the sort for already sorted list?
Of course
it should not be a degenerate
case and take longer than sorting random numbers, but this is not
the case
here. Sorting already sorted
lists is, even with my version, over 4 times faster than sorting
random
lists. This sounds perfectly
acceptable to me, and I feel that trying to optimize this specific
case
further, if it comes at the
detriment of the general case, is not desirable.
Thanks,
greg
________________________________
From: siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] On
Behalf Of
Siddhanathan Shanmugam
Sent: Sunday, March 26, 2017 11:41 AM
To: Gregory Popovitch
Cc: Haskell Libraries
Subject: Re: Proposal: a new implementation for Data.List.sort and
Data.List.sortBy, which has better performance characteristics and
is more
laziness-friendly.
Thank you! This identifies a space leak in base which went unnoticed
for 7
years.
Your implementation can be improved further. Instead of splitting
into
pairs, you could instead split into lists of sorted sublists by
replacing
the pairs function with the following
pair = foldr f []
where
f x [] = [[x]]
f x (y:ys)
| x `cmp` head y == LT = (x:y):ys
| otherwise = [x]:y:ys
This should give you the same performance improvements for sorting
random
lists, but better performance while sorting ascending lists.
The version in base takes it one step further by using a DList to
handle the
descending case efficiently as well, except there's a space leak
right now
because of which it is slower.
On Sun, Mar 26, 2017 at 7:21 AM, Gregory Popovitch
<greg7mdp at gmail.com>
wrote:
Motivation:
----------
Data.List.sort is a very important functionality in Haskell.
I
believe that
the proposed implementation is:
- significantly faster than the current implementation on
unsorted
lists,
typically 14% to 27% faster
- more laziness-friendly, i.e.:
take 3 $ sort l
will require significantly less comparisons than the
current
implementation
Proposed Implementation
-----------------------
sort :: (Ord a) => [a] -> [a]
sort = sortBy compare
sortBy cmp [] = []
sortBy cmp xs = head $ until (null.tail) reduce (pair xs)
where
pair (x:y:t) | x `cmp` y == GT = [y, x] : pair t
| otherwise = [x, y] : pair t
pair [x] = [[x]]
pair [] = []
reduce (v:w:x:y:t) = merge v' x' : reduce t
where v' = merge v w
x' = merge x y
reduce (x:y:t) = merge x y : reduce t
reduce xs = xs
merge xs [] = xs
merge [] ys = ys
merge xs@(x:xs') ys@(y:ys')
| x `cmp` y == GT = y : merge xs ys'
| otherwise = x : merge xs' ys
Effect and Interactions
-----------------------
I have a stack project with a criterion test for this new
implementation,
available at https://github.com/greg7mdp/gh
<https://github.com/greg7mdp/ghc-sort> c-sort
<https://github.com/greg7mdp/g <https://github.com/greg7mdp/ghc-sort>
hc-sort>
<https://github.com/greg7mdp/g
<https://github.com/greg7mdp/ghc-sort> hc-sort
<https://github.com/greg7mdp/g <https://github.com/greg7mdp/ghc-sort>
hc-sort> > .
I ran the tests on an Ubuntu 14.0.2 VM and GHC 8.0.2, and
had the
following
results:
- sorting of random lists of integers is 27% faster
- sorting of random lists of strings is 14% faster
- sorting of already sorted lists is significantly slower,
but still
much
faster than sorting random lists
- proposed version is more laziness friendly. For example
this
version of
sortBy requires 11 comparisons to find
the smallest element of a 15 element list, while the
default
Data.List.sortBy requires 15 comparisons.
(see
https://github.com/greg7mdp/gh
<https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs>
c-sort/blob/master/src/sort_with_trace.hs
<https://github.com/greg7mdp/g
<https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs>
hc-sort/blob/master/src/sort_with_trace.hs>
<https://github.com/greg7mdp/g
<https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs>
hc-sort/blob/master/src/sort_with_trace.hs
<https://github.com/greg7mdp/g
<https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs>
hc-sort/blob/master/src/sort_with_trace.hs> >
)
Test results
------------
Criterion output (descending/ascending results are for
already
sorted
lists).
I barely understand what Criterion does, and I am puzzled
with the
various
"T" output - maybe there is a bug in my bench code:
vagrant at vagrant-ubuntu-trusty-64:/vagrant$ stack exec
ghc-sort
benchmarking ascending ints/ghc
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime
160.6 ms
(153.4
ms .. 167.8 ms)
0.997 R² (0.986 R² .. 1.000 R²)
mean 161.7 ms (158.3 ms .. 165.9 ms)
std dev 5.210 ms (3.193 ms .. 7.006 ms)
variance introduced by outliers: 12% (moderately inflated)
benchmarking ascending ints/greg
TTTTTTTTTTTTTTTTtime 473.8 ms (398.6 ms ..
554.9
ms)
0.996 R² (0.987 R² .. 1.000 R²)
mean 466.2 ms (449.0 ms .. 475.0 ms)
std dev 14.94 ms (0.0 s .. 15.29 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking descending ints/ghc
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime
165.1 ms
(148.2
ms .. 178.2 ms)
0.991 R² (0.957 R² .. 1.000 R²)
mean 158.7 ms (154.0 ms .. 164.3 ms)
std dev 7.075 ms (4.152 ms .. 9.903 ms)
variance introduced by outliers: 12% (moderately inflated)
benchmarking descending ints/greg
TTTTTTTTTTTTTTTTtime 471.7 ms (419.8 ms ..
508.3
ms)
0.999 R² (0.995 R² .. 1.000 R²)
mean 476.0 ms (467.5 ms .. 480.0 ms)
std dev 7.447 ms (67.99 as .. 7.865 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/ghc
TTTTTTTTTTTTTTTTtime 2.852 s (2.564 s ..
3.019 s)
0.999 R² (0.997 R² .. 1.000 R²)
mean 2.812 s (2.785 s .. 2.838 s)
std dev 44.06 ms (543.9 as .. 44.97 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking random ints/greg
TTTTTTTTTTTTTTTTtime 2.032 s (1.993 s ..
2.076 s)
1.000 R² (1.000 R² .. 1.000 R²)
mean 2.028 s (2.019 s .. 2.033 s)
std dev 7.832 ms (0.0 s .. 8.178 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking shakespeare/ghc
TTTTTTTTTTTTTTTTtime 6.504 s (6.391 s ..
6.694 s)
1.000 R² (1.000 R² .. 1.000 R²)
mean 6.499 s (6.468 s .. 6.518 s)
std dev 28.85 ms (0.0 s .. 32.62 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking shakespeare/greg
TTTTTTTTTTTTTTTTtime 5.560 s (5.307 s ..
5.763 s)
1.000 R² (0.999 R² .. 1.000 R²)
mean 5.582 s (5.537 s .. 5.607 s)
std dev 39.30 ms (0.0 s .. 43.49 ms)
variance introduced by outliers: 19% (moderately inflated)
Costs and Drawbacks
-------------------
The only cost I see is the reduced performance when sorting
already
sorted
lists. However, since this remains quite efficient, indeed
over 4
times
faster than sorting unsorted lists, I think it is an
acceptable
tradeoff.
Final note
----------
My Haskell is very rusty. I worked on this a couple years
ago when I
was
learning Haskell, and meant to propose it to the Haskell
community,
but
never got to it at the time.
_______________________________________________
Libraries mailing list
Libraries at haskell.org
http://mail.haskell.org/cgi-bi
<http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries>
n/mailman/listinfo/libraries
<http://mail.haskell.org/cgi-b
<http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries>
in/mailman/listinfo/libraries>
<http://mail.haskell.org/cgi-b
<http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries>
in/mailman/listinfo/libraries
<http://mail.haskell.org/cgi-b
<http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries>
in/mailman/listinfo/libraries> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20170327/57989136/attachment-0001.html>
More information about the Libraries
mailing list