Proposal: a new implementation for Data.List.sort and Data.List.sortBy, which has better performance characteristics and is more laziness-friendly.

Mon Mar 27 16:53:10 UTC 2017

We can improve things a bit further by forcing evaluation (with seq) along
the way appropriately.

gregSortBy cmp [] = []
gregSortBy cmp xs = head $ until (null.tail) reduce (pair xs)
  where
    pair (x:y:t) | x `cmp` y == GT  = [y, x] : pair t
                 | otherwise        = [x, y] : pair t
    pair [x] = [[x]]
    pair []  = []

    reduce (v:w:x:y:t) = merge v' x' `seq` merge v' x' : reduce t
                         where v' = merge v w `seq` merge v w
                               x' = merge x y `seq` merge x y

    reduce (x:y:t) = merge x y `seq` merge x y : reduce t
    reduce xs      = xs

    merge xs []           = xs
    merge []  ys          = ys
    merge xs@(x:xs') ys@(y:ys')
         | x `cmp` y == GT  = y : merge xs  ys'
         | otherwise        = x : merge xs' ys

gSortBy cmp = mergeAll . sequences
  where
    sequences (a:b:xs)
      | a `cmp` b == GT = descending b [a]  xs
      | otherwise       = ascending  b (a:) xs
    sequences xs = [xs]

    descending a as (b:bs)
      | a `cmp` b == GT = descending b (a:as) bs
    descending a as bs  = (a:as) `seq` (a:as) : sequences bs

    ascending a as (b:bs)
      | a `cmp` b /= GT = ascending b (as . (a:)) bs
    ascending a as bs   = as [a] `seq` as [a] : sequences bs

    mergeAll [x] = x
    mergeAll xs  = mergeAll (mergePairs xs)

    mergePairs (a:b:xs) = merge a b `seq` merge a b : mergePairs xs
    mergePairs xs       = xs

    merge as@(a:as') bs@(b:bs')
      | a `cmp` b == GT = b : merge as  bs'
      | otherwise       = a : merge as' bs
    merge [] bs         = bs
    merge as []         = as

*Before the change:*

benchmarking random ints/ghc
time                 3.687 s    (3.541 s .. NaN s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 3.691 s    (3.669 s .. 3.705 s)
std dev              21.45 ms   (0.0 s .. 24.76 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking random ints/greg
time                 2.648 s    (2.482 s .. 2.822 s)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 2.704 s    (2.670 s .. 2.736 s)
std dev              52.68 ms   (0.0 s .. 54.49 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking random ints/gSort
time                 2.733 s    (2.682 s .. 2.758 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.707 s    (2.689 s .. 2.718 s)
std dev              16.84 ms   (0.0 s .. 19.20 ms)
variance introduced by outliers: 19% (moderately inflated)

*After the change:*

benchmarking random ints/greg
time                 2.576 s    (2.548 s .. 2.628 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.590 s    (2.578 s .. 2.599 s)
std dev              12.99 ms   (0.0 s .. 14.89 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking random ints/gSort
time                 2.538 s    (2.412 s .. 2.627 s)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 2.543 s    (2.517 s .. 2.560 s)
std dev              26.16 ms   (0.0 s .. 30.21 ms)
variance introduced by outliers: 19% (moderately inflated)

On Sun, Mar 26, 2017 at 1:54 PM, Siddhanathan Shanmugam <
siddhanathan+eml at gmail.com> wrote:

> Theoretically, we could do better. We currently only exploit monotonic
> runs in merge sort, but we could also exploit bitonic runs:
>
>     dlist as = as [] `seq` as []
>
>     sequences [] = [[]]
>     sequences [a] = [[a]]
>     sequences (a:xs) = bitonic a a (a:) xs
>
>     bitonic min max as (b:bs)
>       | b `cmp` max /= LT = bitonic min b   (as . (b:)) bs
>       | b `cmp` min /= GT = bitonic b   max ((b:) . as) bs
>       | otherwise = dlist as : sequences (b:bs)
>     bitonic _ _ as [] = [dlist as]
>
>
> The constant factors here might be too high to notice the difference
> though.
>
>
> > However, still my version is more laziness-friendly, i.e. it requires
> fewer
> > comparisons to get the
> > N smallest elements of a list (see
> > https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs
> ).
> >
> > I wonder if this might not be a more useful trait than being able to sort
> > already sorted lists super fast.
>
> This comes down to a discussion of merge sort vs natural merge sort.
>
> Data.List.sort is an implementation of a variant of merge sort called
> natural merge sort. The algorithm is linearithmic in the worst case, but
> linear in the best case (already sorted list).
>
>
>
> On Sun, Mar 26, 2017 at 10:47 AM, Gregory Popovitch <greg7mdp at gmail.com>
> wrote:
>
>> Thanks again @Siddhanathan! Looks like your gSort fixes the main issue
>> with
>> Data.List.sort().
>>
>> I have updated the test programs in https://github.com/greg7mdp/ghc-sort
>> to
>> include your new version.
>>
>> Here are the results (your new version looks like a definite improvement
>> vs
>> the current GHC one):
>>
>> input                        GHC sort         My Orig proposal     gSort
>> ------------------------------------------------------------
>> ----------------
>> ---
>> sorted ints (ascending)      151               456                  148
>> sorted ints (descending)     152               466                  155
>> random ints                 2732              2006                 2004
>> random strings              6564              5549                 5528
>>
>>
>> So replacing the current GHC version with gSort is a no brainer, as it is
>> better in all regards.
>>
>> However, still my version is more laziness-friendly, i.e. it requires
>> fewer
>> comparisons to get the
>> N smallest elements of a list (see
>> https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs).
>>
>> I wonder if this might not be a more useful trait than being able to sort
>> already sorted lists super fast.
>>
>> Thanks,
>>
>> greg
>>
>> ________________________________
>>
>> From: siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] On Behalf Of
>> Siddhanathan Shanmugam
>> Sent: Sunday, March 26, 2017 1:05 PM
>> To: Gregory Popovitch
>> Cc: Haskell Libraries
>> Subject: Re: Proposal: a new implementation for Data.List.sort and
>> Data.List.sortBy, which has better performance characteristics and is more
>> laziness-friendly.
>>
>>
>> Interesting. You are right, performance for sorting random lists has
>> priority over performance for sorting already-sorted lists.
>>
>> Ignore the numbers for my previous version. Can you compare GHC's sort,
>> your
>> proposal, and gSort below?
>>
>>
>> gSort :: Ord a => [a] -> [a]
>> gSort = gSortBy compare
>> gSortBy cmp = mergeAll . sequences
>>   where
>>     sequences (a:b:xs)
>>       | a `cmp` b == GT = descending b [a]  xs
>>       | otherwise       = ascending  b (a:) xs
>>     sequences xs = [xs]
>>
>>
>>     descending a as (b:bs)
>>       | a `cmp` b == GT = descending b (a:as) bs
>>     descending a as bs  = (a:as) : sequences bs
>>
>>
>>     ascending a as (b:bs)
>>       | a `cmp` b /= GT = ascending b (\ys -> as (a:ys)) bs
>>     ascending a as bs   = as [a] `seq` as [a] : sequences bs
>>
>>
>>     mergeAll [x] = x
>>     mergeAll xs  = mergeAll (mergePairs xs)
>>
>>
>>     mergePairs (a:b:xs) = merge a b : mergePairs xs
>>     mergePairs xs       = xs
>>
>>
>>     merge as@(a:as') bs@(b:bs')
>>       | a `cmp` b == GT = b : merge as  bs'
>>       | otherwise       = a : merge as' bs
>>     merge [] bs         = bs
>>     merge as []         = as
>>
>>
>> Thanks,
>> Sid
>>
>>
>> On Sun, Mar 26, 2017 at 9:19 AM, Gregory Popovitch <greg7mdp at gmail.com>
>> wrote:
>>
>>
>>         Thank you @Siddhanathan! I welcome any improvement you may make,
>> as
>> I said I
>>         am very far from a Haskell expert.
>>
>>         I just tested your change with my test project
>>         (https://github.com/greg7mdp/ghc-sort
>> <https://github.com/greg7mdp/ghc-sort> )
>>         and here are my results (mean times in ms):
>>
>>         input                        GHC sort          Orig proposal
>> your
>>         change
>>
>> ------------------------------------------------------------
>> ----------------
>>         ---
>>         sorted ints (ascending)      153               467
>> 139
>>         sorted ints (descending)     152               472
>> 599
>>         random ints                 2824              2077
>> 2126
>>         random strings              6564              5613
>> 5983
>>
>>         Your change is a definite improvement for sorted integers in
>> ascending
>>         order, but is worse for other cases.
>>
>>         Is there a real need to optimize the sort for already sorted list?
>> Of course
>>         it should not be a degenerate
>>         case and take longer than sorting random numbers, but this is not
>> the case
>>         here. Sorting already sorted
>>         lists is, even with my version, over 4 times faster than sorting
>> random
>>         lists. This sounds perfectly
>>         acceptable to me, and I feel that trying to optimize this specific
>> case
>>         further, if it comes at the
>>         detriment of the general case, is not desirable.
>>
>>         Thanks,
>>
>>         greg
>>
>>         ________________________________
>>
>>         From: siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] On
>> Behalf Of
>>         Siddhanathan Shanmugam
>>         Sent: Sunday, March 26, 2017 11:41 AM
>>         To: Gregory Popovitch
>>         Cc: Haskell Libraries
>>         Subject: Re: Proposal: a new implementation for Data.List.sort and
>>         Data.List.sortBy, which has better performance characteristics and
>> is more
>>         laziness-friendly.
>>
>>
>>
>>         Thank you! This identifies a space leak in base which went
>> unnoticed
>> for 7
>>         years.
>>
>>         Your implementation can be improved further. Instead of splitting
>> into
>>         pairs, you could instead split into lists of sorted sublists by
>> replacing
>>         the pairs function with the following
>>
>>             pair = foldr f []
>>               where
>>                 f x [] = [[x]]
>>                 f x (y:ys)
>>                   | x `cmp` head y == LT = (x:y):ys
>>                   | otherwise            = [x]:y:ys
>>
>>         This should give you the same performance improvements for sorting
>> random
>>         lists, but better performance while sorting ascending lists.
>>
>>         The version in base takes it one step further by using a DList to
>> handle the
>>         descending case efficiently as well, except there's a space leak
>> right now
>>         because of which it is slower.
>>
>>         On Sun, Mar 26, 2017 at 7:21 AM, Gregory Popovitch
>> <greg7mdp at gmail.com>
>>         wrote:
>>
>>
>>
>>                 Motivation:
>>                 ----------
>>
>>                 Data.List.sort is a very important functionality in
>> Haskell.
>> I
>>         believe that
>>                 the proposed implementation is:
>>
>>                 - significantly faster than the current implementation on
>> unsorted
>>         lists,
>>                 typically 14% to 27% faster
>>                 - more laziness-friendly, i.e.:
>>                     take 3 $ sort l
>>                   will require significantly less comparisons than the
>> current
>>                 implementation
>>
>>                 Proposed Implementation
>>                 -----------------------
>>
>>                 sort :: (Ord a) => [a] -> [a]
>>                 sort =  sortBy compare
>>
>>                 sortBy cmp [] = []
>>                 sortBy cmp xs = head $ until (null.tail) reduce (pair xs)
>>                   where
>>                     pair (x:y:t) | x `cmp` y == GT  = [y, x] : pair t
>>                                  | otherwise        = [x, y] : pair t
>>                     pair [x] = [[x]]
>>                     pair []  = []
>>
>>                     reduce (v:w:x:y:t) = merge v' x' : reduce t
>>                                          where v' = merge v w
>>                                                x' = merge x y
>>
>>                     reduce (x:y:t) = merge x y : reduce t
>>                     reduce xs      = xs
>>
>>                     merge xs []           = xs
>>                     merge []  ys          = ys
>>                     merge xs@(x:xs') ys@(y:ys')
>>                          | x `cmp` y == GT  = y : merge xs  ys'
>>                          | otherwise        = x : merge xs' ys
>>
>>
>>                 Effect and Interactions
>>                 -----------------------
>>
>>                 I have a stack project with a criterion test for this new
>>         implementation,
>>                 available at https://github.com/greg7mdp/ghc-sort
>> <https://github.com/greg7mdp/ghc-sort>
>>
>>         <https://github.com/greg7mdp/ghc-sort
>> <https://github.com/greg7mdp/ghc-sort> > .
>>                 I ran the tests on an Ubuntu 14.0.2 VM and GHC 8.0.2, and
>> had the
>>         following
>>                 results:
>>
>>                 - sorting of random lists of integers is 27% faster
>>                 - sorting of random lists of strings is 14% faster
>>                 - sorting of already sorted lists is significantly slower,
>> but still
>>         much
>>                 faster than sorting random lists
>>                 - proposed version is more laziness friendly. For example
>> this
>>         version of
>>                 sortBy requires 11 comparisons to find
>>                   the smallest element of a 15 element list, while the
>> default
>>                 Data.List.sortBy requires 15 comparisons.
>>                   (see
>>
>>
>> https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs
>> <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs>
>>
>> <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs
>> <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs>
>> >
>> )
>>
>>
>>
>>                 Test results
>>                 ------------
>>
>>                 Criterion output (descending/ascending results are for
>> already
>>         sorted
>>                 lists).
>>                 I barely understand what Criterion does, and I am puzzled
>> with the
>>         various
>>                 "T" output - maybe there is a bug in my bench code:
>>
>>                 vagrant at vagrant-ubuntu-trusty-64:/vagrant$ stack exec
>> ghc-sort
>>                 benchmarking ascending ints/ghc
>>                 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime
>> 160.6 ms
>>         (153.4
>>                 ms .. 167.8 ms)
>>                                      0.997 R²   (0.986 R² .. 1.000 R²)
>>                 mean                 161.7 ms   (158.3 ms .. 165.9 ms)
>>                 std dev              5.210 ms   (3.193 ms .. 7.006 ms)
>>                 variance introduced by outliers: 12% (moderately inflated)
>>
>>                 benchmarking ascending ints/greg
>>                 TTTTTTTTTTTTTTTTtime                 473.8 ms   (398.6 ms
>> ..
>> 554.9
>>         ms)
>>                                      0.996 R²   (0.987 R² .. 1.000 R²)
>>                 mean                 466.2 ms   (449.0 ms .. 475.0 ms)
>>                 std dev              14.94 ms   (0.0 s .. 15.29 ms)
>>                 variance introduced by outliers: 19% (moderately inflated)
>>
>>                 benchmarking descending ints/ghc
>>                 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime
>> 165.1 ms
>>         (148.2
>>                 ms .. 178.2 ms)
>>                                      0.991 R²   (0.957 R² .. 1.000 R²)
>>                 mean                 158.7 ms   (154.0 ms .. 164.3 ms)
>>                 std dev              7.075 ms   (4.152 ms .. 9.903 ms)
>>                 variance introduced by outliers: 12% (moderately inflated)
>>
>>                 benchmarking descending ints/greg
>>                 TTTTTTTTTTTTTTTTtime                 471.7 ms   (419.8 ms
>> ..
>> 508.3
>>         ms)
>>                                      0.999 R²   (0.995 R² .. 1.000 R²)
>>                 mean                 476.0 ms   (467.5 ms .. 480.0 ms)
>>                 std dev              7.447 ms   (67.99 as .. 7.865 ms)
>>                 variance introduced by outliers: 19% (moderately inflated)
>>
>>                 benchmarking random ints/ghc
>>                 TTTTTTTTTTTTTTTTtime                 2.852 s    (2.564 s
>> ..
>> 3.019 s)
>>                                      0.999 R²   (0.997 R² .. 1.000 R²)
>>                 mean                 2.812 s    (2.785 s .. 2.838 s)
>>                 std dev              44.06 ms   (543.9 as .. 44.97 ms)
>>                 variance introduced by outliers: 19% (moderately inflated)
>>
>>                 benchmarking random ints/greg
>>                 TTTTTTTTTTTTTTTTtime                 2.032 s    (1.993 s
>> ..
>> 2.076 s)
>>                                      1.000 R²   (1.000 R² .. 1.000 R²)
>>                 mean                 2.028 s    (2.019 s .. 2.033 s)
>>                 std dev              7.832 ms   (0.0 s .. 8.178 ms)
>>                 variance introduced by outliers: 19% (moderately inflated)
>>
>>                 benchmarking shakespeare/ghc
>>                 TTTTTTTTTTTTTTTTtime                 6.504 s    (6.391 s
>> ..
>> 6.694 s)
>>                                      1.000 R²   (1.000 R² .. 1.000 R²)
>>                 mean                 6.499 s    (6.468 s .. 6.518 s)
>>                 std dev              28.85 ms   (0.0 s .. 32.62 ms)
>>                 variance introduced by outliers: 19% (moderately inflated)
>>
>>                 benchmarking shakespeare/greg
>>                 TTTTTTTTTTTTTTTTtime                 5.560 s    (5.307 s
>> ..
>> 5.763 s)
>>                                      1.000 R²   (0.999 R² .. 1.000 R²)
>>                 mean                 5.582 s    (5.537 s .. 5.607 s)
>>                 std dev              39.30 ms   (0.0 s .. 43.49 ms)
>>                 variance introduced by outliers: 19% (moderately inflated)
>>
>>
>>                 Costs and Drawbacks
>>                 -------------------
>>
>>                 The only cost I see is the reduced performance when
>> sorting
>> already
>>         sorted
>>                 lists. However, since this remains quite efficient, indeed
>> over 4
>>         times
>>                 faster than sorting unsorted lists, I think it is an
>> acceptable
>>         tradeoff.
>>
>>                 Final note
>>                 ----------
>>
>>                 My Haskell is very rusty. I worked on this a couple years
>> ago when I
>>         was
>>                 learning Haskell, and meant to propose it to the Haskell
>> community,
>>         but
>>                 never got to it at the time.
>>
>>                 _______________________________________________
>>                 Libraries mailing list
>>                 Libraries at haskell.org
>>                 http://mail.haskell.org/cgi-bi
>> n/mailman/listinfo/libraries
>> <http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries>
>>
>>         <http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>> <http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries> >
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20170327/b4c69d63/attachment-0001.html>