fix for Data.List.sortBy

Siddhanathan Shanmugam siddhanathan+eml at gmail.com
Tue Mar 28 17:42:09 UTC 2017


On Tue, Mar 28, 2017 at 5:03 AM, Gregory Popovitch <greg7mdp at gmail.com>
wrote:

> Sid, this new version (diff below) is really fast when sorting random
> ints, but slower when sorting strings:
>

Ok, let's not use that then. Also, add an inline pragma on the function
merge.


    {-# INLINE merge #-}
    merge as@(a:as') bs@(b:bs')




>
> input                        GHC sort          Orig proposal        gSort
> -------------------------------------------------------------------------
> sorted ints (ascending)      151               460                  147
> sorted ints (descending)     151               467                  171
> random ints                 2771              2010                 1365
> random strings              6542              5524                 5991
>
>
>
> Thanks,
>
> greg
>
> ------------------------------
> *From:* siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] *On Behalf
> Of *Siddhanathan Shanmugam
> *Sent:* Tuesday, March 28, 2017 2:45 AM
> *To:* Dan Burton
> *Cc:* Haskell Libraries; Gregory Popovitch
> *Subject:* Re: fix for Data.List.sortBy
>
> Turns out we don't need seq at all. A simple refactoring of the merge
> function does the trick equally well.
>
>     mergePairs (a:b:xs) = merge id a b : mergePairs xs
>     mergePairs xs       = xs
>
>     merge f as@(a:as') bs@(b:bs')
>       | a `cmp` b == GT = merge (f.(b:)) as  bs'
>       | otherwise       = merge (f.(a:)) as' bs
>     merge f [] bs       = f bs
>     merge f as []       = f as
>
> This variant is 10% faster in my tests.
>
>
> On Mon, Mar 27, 2017 at 5:49 PM, Dan Burton <danburton.email at gmail.com>
> wrote:
>
>> Does this rely on Common Subexpression Elimination optimization in order
>> to work? Would it work more reliably if the `seq`-ed expression were
>> let-bound?
>>
>
> I don't think it relies heavily on CSE. The seq's are there to avoid a
> cascading series of thunk evaluations. Using let expressions doesn't seem
> to affect the benchmarks.
>
>
>>
>> -- Dan Burton
>>
>> On Mon, Mar 27, 2017 at 5:41 PM, David Feuer <david.feuer at gmail.com>
>> wrote:
>>
>>> The first seq is useless: constructor application is never suspended. I
>>> haven't had a chance to look at the rest yet.
>>>
>>> On Mar 27, 2017 7:59 PM, "Gregory Popovitch" <greg7mdp at gmail.com> wrote:
>>>
>>>> Sid,
>>>>
>>>> I'd be delighted to submit the patch, as long as I have permission
>>>> (which I probably don't), you feel confident about the change and maybe a
>>>> couple of other people agree.
>>>>
>>>> Here is the proposed change. Tests shows significant speed improvement
>>>> (30%) when sorting lists of random numbers, and same efficiency for sorting
>>>> already sorted lists (both ascending and descending).
>>>>
>>>> Thanks,
>>>>
>>>> greg
>>>>
>>>> ------------------------------
>>>> *From:* siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] *On
>>>> Behalf Of *Siddhanathan Shanmugam
>>>> *Sent:* Monday, March 27, 2017 6:53 PM
>>>> *To:* Gregory Popovitch
>>>> *Subject:* RE: Proposal: a new implementation for Data.List.sort and
>>>> Data.List.sortBy, which has better performance characteristics and is more
>>>> laziness-friendly.
>>>>
>>>> Since I don't see any regressions, this doesn't really need CLC
>>>> approval. The changes are also small enough that a Github PR may be
>>>> accepted (otherwise, the change goes in via Phabricator).
>>>>
>>>> Are you interested in implementing this patch? If yes, a standard
>>>> Github PR should be fine. Right now gSort is a three line change I think.
>>>> It will be changed in ghc/libraries/base/Data/OldList.hs on the
>>>> ghc/ghc repo on Github.
>>>>
>>>> I'm hoping for some more comments from other Haskellers, before pushing
>>>> for this change in base. I feel like we may be missing a potential
>>>> optimization that someone else might spot. So probably going to wait a few
>>>> days.
>>>>
>>>>
>>>> On Mar 27, 2017 11:43 AM, "Gregory Popovitch" <greg7mdp at gmail.com>
>>>> wrote:
>>>>
>>>> Hi Sid,
>>>>
>>>> Thanks, glad you looked into that. My understanding of the Haskell
>>>> execution model is really poor, so I can't say one way or the other, but I
>>>> felt that laziness ought to be considered as well, and I'm glad it was :-)
>>>>
>>>> So  in conclusion it looks like we have a winner with your latest
>>>> gSortBy version. How do we get this pushed to the GHC library?
>>>>
>>>> Thanks,
>>>>
>>>> greg
>>>>
>>>> ------------------------------
>>>> *From:* siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] *On
>>>> Behalf Of *Siddhanathan Shanmugam
>>>> *Sent:* Monday, March 27, 2017 2:12 PM
>>>> *To:* Gregory Popovitch
>>>>
>>>> *Subject:* Re: Proposal: a new implementation for Data.List.sort and
>>>> Data.List.sortBy, which has better performance characteristics and is more
>>>> laziness-friendly.
>>>>
>>>> Hi Greg,
>>>>
>>>> On Mon, Mar 27, 2017 at 10:19 AM, Gregory Popovitch <greg7mdp at gmail.com
>>>> > wrote:
>>>>
>>>>> Unfortunately, this optimization makes the sort less lazy, so doing
>>>>> something like:
>>>>>
>>>>> take 4 $ sort l
>>>>>
>>>>> requires more sorting of the list l with this change. I'm not sure it
>>>>> is a good tradeoff.
>>>>>
>>>>> This can be verified with: https://github.com/greg7mdp/gh
>>>>> c-sort/blob/master/src/sort_with_trace.hs
>>>>>
>>>>
>>>> I think you're running without optimizations turned on. It is lazy in
>>>> my case.
>>>>
>>>> Also, the difference should be negligible (if any at all). Here's an
>>>> example of the list being sorted:
>>>>
>>>> [11,4,6,8,2,5,1,7,9,55,11,3]
>>>> ...
>>>> [[4,11],[6,8],[2,5],[1,7,9,55],[3,11],[]]
>>>> ...
>>>> [[1,2,4,5,6,7,8,9,11,55],[3,11]]
>>>>  * 1 3
>>>>  * 2 3
>>>>  * 4 3
>>>>  * 4 11
>>>> [1,2,3,4]
>>>>
>>>> The number of operations saved is only in the last merge. It's only
>>>> lazy at this step.
>>>>
>>>> So we save at most one traversal of the list, which is not too
>>>> expensive since our worst case bounds is O(n log n) anyway.
>>>>
>>>> This should mean that the asymptotic performance should be identical,
>>>> regardless of the number of comparisons saved. Of course, you do get better
>>>> constants, but I would be surprised if those constants translated to
>>>> significantly better performance for a reasonable size list.
>>>>
>>>>
>>>>>
>>>>>  I do agree that it would be nice to have a more serious validation
>>>>> test suite.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> greg
>>>>>
>>>>> ------------------------------
>>>>> *From:* siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] *On
>>>>> Behalf Of *Siddhanathan Shanmugam
>>>>> *Sent:* Monday, March 27, 2017 12:53 PM
>>>>>
>>>>> *To:* Gregory Popovitch
>>>>> *Cc:* Haskell Libraries
>>>>> *Subject:* Re: Proposal: a new implementation for Data.List.sort and
>>>>> Data.List.sortBy, which has better performance characteristics and is more
>>>>> laziness-friendly.
>>>>>
>>>>> We can improve things a bit further by forcing evaluation (with seq)
>>>>> along the way appropriately.
>>>>>
>>>>>
>>>>>
>>>>> gregSortBy cmp [] = []
>>>>> gregSortBy cmp xs = head $ until (null.tail) reduce (pair xs)
>>>>>   where
>>>>>     pair (x:y:t) | x `cmp` y == GT  = [y, x] : pair t
>>>>>                  | otherwise        = [x, y] : pair t
>>>>>     pair [x] = [[x]]
>>>>>     pair []  = []
>>>>>
>>>>>     reduce (v:w:x:y:t) = merge v' x' `seq` merge v' x' : reduce t
>>>>>                          where v' = merge v w `seq` merge v w
>>>>>                                x' = merge x y `seq` merge x y
>>>>>
>>>>>     reduce (x:y:t) = merge x y `seq` merge x y : reduce t
>>>>>     reduce xs      = xs
>>>>>
>>>>>     merge xs []           = xs
>>>>>     merge []  ys          = ys
>>>>>     merge xs@(x:xs') ys@(y:ys')
>>>>>          | x `cmp` y == GT  = y : merge xs  ys'
>>>>>          | otherwise        = x : merge xs' ys
>>>>>
>>>>>
>>>>> gSortBy cmp = mergeAll . sequences
>>>>>   where
>>>>>     sequences (a:b:xs)
>>>>>       | a `cmp` b == GT = descending b [a]  xs
>>>>>       | otherwise       = ascending  b (a:) xs
>>>>>     sequences xs = [xs]
>>>>>
>>>>>
>>>>>     descending a as (b:bs)
>>>>>       | a `cmp` b == GT = descending b (a:as) bs
>>>>>     descending a as bs  = (a:as) `seq` (a:as) : sequences bs
>>>>>
>>>>>
>>>>>     ascending a as (b:bs)
>>>>>       | a `cmp` b /= GT = ascending b (as . (a:)) bs
>>>>>     ascending a as bs   = as [a] `seq` as [a] : sequences bs
>>>>>
>>>>>
>>>>>     mergeAll [x] = x
>>>>>     mergeAll xs  = mergeAll (mergePairs xs)
>>>>>
>>>>>
>>>>>     mergePairs (a:b:xs) = merge a b `seq` merge a b : mergePairs xs
>>>>>     mergePairs xs       = xs
>>>>>
>>>>>
>>>>>     merge as@(a:as') bs@(b:bs')
>>>>>       | a `cmp` b == GT = b : merge as  bs'
>>>>>       | otherwise       = a : merge as' bs
>>>>>     merge [] bs         = bs
>>>>>     merge as []         = as
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *Before the change:*
>>>>>
>>>>> benchmarking random ints/ghc
>>>>> time                 3.687 s    (3.541 s .. NaN s)
>>>>>                      1.000 R²   (1.000 R² .. 1.000 R²)
>>>>> mean                 3.691 s    (3.669 s .. 3.705 s)
>>>>> std dev              21.45 ms   (0.0 s .. 24.76 ms)
>>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>>
>>>>> benchmarking random ints/greg
>>>>> time                 2.648 s    (2.482 s .. 2.822 s)
>>>>>                      0.999 R²   (0.998 R² .. 1.000 R²)
>>>>> mean                 2.704 s    (2.670 s .. 2.736 s)
>>>>> std dev              52.68 ms   (0.0 s .. 54.49 ms)
>>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>>
>>>>> benchmarking random ints/gSort
>>>>> time                 2.733 s    (2.682 s .. 2.758 s)
>>>>>                      1.000 R²   (1.000 R² .. 1.000 R²)
>>>>> mean                 2.707 s    (2.689 s .. 2.718 s)
>>>>> std dev              16.84 ms   (0.0 s .. 19.20 ms)
>>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>>
>>>>> *After the change:*
>>>>>
>>>>> benchmarking random ints/greg
>>>>> time                 2.576 s    (2.548 s .. 2.628 s)
>>>>>                      1.000 R²   (1.000 R² .. 1.000 R²)
>>>>> mean                 2.590 s    (2.578 s .. 2.599 s)
>>>>> std dev              12.99 ms   (0.0 s .. 14.89 ms)
>>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>>
>>>>> benchmarking random ints/gSort
>>>>> time                 2.538 s    (2.412 s .. 2.627 s)
>>>>>                      1.000 R²   (0.999 R² .. 1.000 R²)
>>>>> mean                 2.543 s    (2.517 s .. 2.560 s)
>>>>> std dev              26.16 ms   (0.0 s .. 30.21 ms)
>>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Mar 26, 2017 at 1:54 PM, Siddhanathan Shanmugam <
>>>>> siddhanathan+eml at gmail.com> wrote:
>>>>>
>>>>>> Theoretically, we could do better. We currently only exploit
>>>>>> monotonic runs in merge sort, but we could also exploit bitonic runs:
>>>>>>
>>>>>>     dlist as = as [] `seq` as []
>>>>>>
>>>>>>     sequences [] = [[]]
>>>>>>     sequences [a] = [[a]]
>>>>>>     sequences (a:xs) = bitonic a a (a:) xs
>>>>>>
>>>>>>     bitonic min max as (b:bs)
>>>>>>       | b `cmp` max /= LT = bitonic min b   (as . (b:)) bs
>>>>>>       | b `cmp` min /= GT = bitonic b   max ((b:) . as) bs
>>>>>>       | otherwise = dlist as : sequences (b:bs)
>>>>>>     bitonic _ _ as [] = [dlist as]
>>>>>>
>>>>>>
>>>>>> The constant factors here might be too high to notice the difference
>>>>>> though.
>>>>>>
>>>>>>
>>>>>> > However, still my version is more laziness-friendly, i.e. it
>>>>>> requires fewer
>>>>>> > comparisons to get the
>>>>>> > N smallest elements of a list (see
>>>>>> > https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_
>>>>>> with_trace.hs).
>>>>>> >
>>>>>> > I wonder if this might not be a more useful trait than being able
>>>>>> to sort
>>>>>> > already sorted lists super fast.
>>>>>>
>>>>>> This comes down to a discussion of merge sort vs natural merge sort.
>>>>>>
>>>>>> Data.List.sort is an implementation of a variant of merge sort called
>>>>>> natural merge sort. The algorithm is linearithmic in the worst case, but
>>>>>> linear in the best case (already sorted list).
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 26, 2017 at 10:47 AM, Gregory Popovitch <
>>>>>> greg7mdp at gmail.com> wrote:
>>>>>>
>>>>>>> Thanks again @Siddhanathan! Looks like your gSort fixes the main
>>>>>>> issue with
>>>>>>> Data.List.sort().
>>>>>>>
>>>>>>> I have updated the test programs in https://github.com/greg7mdp/gh
>>>>>>> c-sort to
>>>>>>> include your new version.
>>>>>>>
>>>>>>> Here are the results (your new version looks like a definite
>>>>>>> improvement vs
>>>>>>> the current GHC one):
>>>>>>>
>>>>>>> input                        GHC sort         My Orig proposal
>>>>>>>  gSort
>>>>>>> ------------------------------------------------------------
>>>>>>> ----------------
>>>>>>> ---
>>>>>>> sorted ints (ascending)      151               456
>>>>>>> 148
>>>>>>> sorted ints (descending)     152               466
>>>>>>> 155
>>>>>>> random ints                 2732              2006
>>>>>>>  2004
>>>>>>> random strings              6564              5549
>>>>>>>  5528
>>>>>>>
>>>>>>>
>>>>>>> So replacing the current GHC version with gSort is a no brainer, as
>>>>>>> it is
>>>>>>> better in all regards.
>>>>>>>
>>>>>>> However, still my version is more laziness-friendly, i.e. it
>>>>>>> requires fewer
>>>>>>> comparisons to get the
>>>>>>> N smallest elements of a list (see
>>>>>>> https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_wi
>>>>>>> th_trace.hs).
>>>>>>>
>>>>>>> I wonder if this might not be a more useful trait than being able to
>>>>>>> sort
>>>>>>> already sorted lists super fast.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> greg
>>>>>>>
>>>>>>> ________________________________
>>>>>>>
>>>>>>> From: siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] On
>>>>>>> Behalf Of
>>>>>>> Siddhanathan Shanmugam
>>>>>>> Sent: Sunday, March 26, 2017 1:05 PM
>>>>>>> To: Gregory Popovitch
>>>>>>> Cc: Haskell Libraries
>>>>>>> Subject: Re: Proposal: a new implementation for Data.List.sort and
>>>>>>> Data.List.sortBy, which has better performance characteristics and
>>>>>>> is more
>>>>>>> laziness-friendly.
>>>>>>>
>>>>>>>
>>>>>>> Interesting. You are right, performance for sorting random lists has
>>>>>>> priority over performance for sorting already-sorted lists.
>>>>>>>
>>>>>>> Ignore the numbers for my previous version. Can you compare GHC's
>>>>>>> sort, your
>>>>>>> proposal, and gSort below?
>>>>>>>
>>>>>>>
>>>>>>> gSort :: Ord a => [a] -> [a]
>>>>>>> gSort = gSortBy compare
>>>>>>> gSortBy cmp = mergeAll . sequences
>>>>>>>   where
>>>>>>>     sequences (a:b:xs)
>>>>>>>       | a `cmp` b == GT = descending b [a]  xs
>>>>>>>       | otherwise       = ascending  b (a:) xs
>>>>>>>     sequences xs = [xs]
>>>>>>>
>>>>>>>
>>>>>>>     descending a as (b:bs)
>>>>>>>       | a `cmp` b == GT = descending b (a:as) bs
>>>>>>>     descending a as bs  = (a:as) : sequences bs
>>>>>>>
>>>>>>>
>>>>>>>     ascending a as (b:bs)
>>>>>>>       | a `cmp` b /= GT = ascending b (\ys -> as (a:ys)) bs
>>>>>>>     ascending a as bs   = as [a] `seq` as [a] : sequences bs
>>>>>>>
>>>>>>>
>>>>>>>     mergeAll [x] = x
>>>>>>>     mergeAll xs  = mergeAll (mergePairs xs)
>>>>>>>
>>>>>>>
>>>>>>>     mergePairs (a:b:xs) = merge a b : mergePairs xs
>>>>>>>     mergePairs xs       = xs
>>>>>>>
>>>>>>>
>>>>>>>     merge as@(a:as') bs@(b:bs')
>>>>>>>       | a `cmp` b == GT = b : merge as  bs'
>>>>>>>       | otherwise       = a : merge as' bs
>>>>>>>     merge [] bs         = bs
>>>>>>>     merge as []         = as
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sid
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Mar 26, 2017 at 9:19 AM, Gregory Popovitch <
>>>>>>> greg7mdp at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>         Thank you @Siddhanathan! I welcome any improvement you may
>>>>>>> make, as
>>>>>>> I said I
>>>>>>>         am very far from a Haskell expert.
>>>>>>>
>>>>>>>         I just tested your change with my test project
>>>>>>>         (https://github.com/greg7mdp/ghc-sort
>>>>>>> <https://github.com/greg7mdp/ghc-sort> )
>>>>>>>         and here are my results (mean times in ms):
>>>>>>>
>>>>>>>         input                        GHC sort          Orig proposal
>>>>>>> your
>>>>>>>         change
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> ----------------
>>>>>>>         ---
>>>>>>>         sorted ints (ascending)      153               467
>>>>>>> 139
>>>>>>>         sorted ints (descending)     152               472
>>>>>>> 599
>>>>>>>         random ints                 2824              2077
>>>>>>> 2126
>>>>>>>         random strings              6564              5613
>>>>>>> 5983
>>>>>>>
>>>>>>>         Your change is a definite improvement for sorted integers in
>>>>>>> ascending
>>>>>>>         order, but is worse for other cases.
>>>>>>>
>>>>>>>         Is there a real need to optimize the sort for already sorted
>>>>>>> list?
>>>>>>> Of course
>>>>>>>         it should not be a degenerate
>>>>>>>         case and take longer than sorting random numbers, but this
>>>>>>> is not
>>>>>>> the case
>>>>>>>         here. Sorting already sorted
>>>>>>>         lists is, even with my version, over 4 times faster than
>>>>>>> sorting
>>>>>>> random
>>>>>>>         lists. This sounds perfectly
>>>>>>>         acceptable to me, and I feel that trying to optimize this
>>>>>>> specific
>>>>>>> case
>>>>>>>         further, if it comes at the
>>>>>>>         detriment of the general case, is not desirable.
>>>>>>>
>>>>>>>         Thanks,
>>>>>>>
>>>>>>>         greg
>>>>>>>
>>>>>>>         ________________________________
>>>>>>>
>>>>>>>         From: siddhanathan at gmail.com [mailto:siddhanathan at gmail.com]
>>>>>>> On
>>>>>>> Behalf Of
>>>>>>>         Siddhanathan Shanmugam
>>>>>>>         Sent: Sunday, March 26, 2017 11:41 AM
>>>>>>>         To: Gregory Popovitch
>>>>>>>         Cc: Haskell Libraries
>>>>>>>         Subject: Re: Proposal: a new implementation for
>>>>>>> Data.List.sort and
>>>>>>>         Data.List.sortBy, which has better performance
>>>>>>> characteristics and
>>>>>>> is more
>>>>>>>         laziness-friendly.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         Thank you! This identifies a space leak in base which went
>>>>>>> unnoticed
>>>>>>> for 7
>>>>>>>         years.
>>>>>>>
>>>>>>>         Your implementation can be improved further. Instead of
>>>>>>> splitting
>>>>>>> into
>>>>>>>         pairs, you could instead split into lists of sorted sublists
>>>>>>> by
>>>>>>> replacing
>>>>>>>         the pairs function with the following
>>>>>>>
>>>>>>>             pair = foldr f []
>>>>>>>               where
>>>>>>>                 f x [] = [[x]]
>>>>>>>                 f x (y:ys)
>>>>>>>                   | x `cmp` head y == LT = (x:y):ys
>>>>>>>                   | otherwise            = [x]:y:ys
>>>>>>>
>>>>>>>         This should give you the same performance improvements for
>>>>>>> sorting
>>>>>>> random
>>>>>>>         lists, but better performance while sorting ascending lists.
>>>>>>>
>>>>>>>         The version in base takes it one step further by using a
>>>>>>> DList to
>>>>>>> handle the
>>>>>>>         descending case efficiently as well, except there's a space
>>>>>>> leak
>>>>>>> right now
>>>>>>>         because of which it is slower.
>>>>>>>
>>>>>>>         On Sun, Mar 26, 2017 at 7:21 AM, Gregory Popovitch
>>>>>>> <greg7mdp at gmail.com>
>>>>>>>         wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                 Motivation:
>>>>>>>                 ----------
>>>>>>>
>>>>>>>                 Data.List.sort is a very important functionality in
>>>>>>> Haskell.
>>>>>>> I
>>>>>>>         believe that
>>>>>>>                 the proposed implementation is:
>>>>>>>
>>>>>>>                 - significantly faster than the current
>>>>>>> implementation on
>>>>>>> unsorted
>>>>>>>         lists,
>>>>>>>                 typically 14% to 27% faster
>>>>>>>                 - more laziness-friendly, i.e.:
>>>>>>>                     take 3 $ sort l
>>>>>>>                   will require significantly less comparisons than
>>>>>>> the
>>>>>>> current
>>>>>>>                 implementation
>>>>>>>
>>>>>>>                 Proposed Implementation
>>>>>>>                 -----------------------
>>>>>>>
>>>>>>>                 sort :: (Ord a) => [a] -> [a]
>>>>>>>                 sort =  sortBy compare
>>>>>>>
>>>>>>>                 sortBy cmp [] = []
>>>>>>>                 sortBy cmp xs = head $ until (null.tail) reduce
>>>>>>> (pair xs)
>>>>>>>                   where
>>>>>>>                     pair (x:y:t) | x `cmp` y == GT  = [y, x] : pair t
>>>>>>>                                  | otherwise        = [x, y] : pair t
>>>>>>>                     pair [x] = [[x]]
>>>>>>>                     pair []  = []
>>>>>>>
>>>>>>>                     reduce (v:w:x:y:t) = merge v' x' : reduce t
>>>>>>>                                          where v' = merge v w
>>>>>>>                                                x' = merge x y
>>>>>>>
>>>>>>>                     reduce (x:y:t) = merge x y : reduce t
>>>>>>>                     reduce xs      = xs
>>>>>>>
>>>>>>>                     merge xs []           = xs
>>>>>>>                     merge []  ys          = ys
>>>>>>>                     merge xs@(x:xs') ys@(y:ys')
>>>>>>>                          | x `cmp` y == GT  = y : merge xs  ys'
>>>>>>>                          | otherwise        = x : merge xs' ys
>>>>>>>
>>>>>>>
>>>>>>>                 Effect and Interactions
>>>>>>>                 -----------------------
>>>>>>>
>>>>>>>                 I have a stack project with a criterion test for
>>>>>>> this new
>>>>>>>         implementation,
>>>>>>>                 available at https://github.com/greg7mdp/ghc-sort
>>>>>>> <https://github.com/greg7mdp/ghc-sort>
>>>>>>>
>>>>>>>         <https://github.com/greg7mdp/ghc-sort
>>>>>>> <https://github.com/greg7mdp/ghc-sort> > .
>>>>>>>                 I ran the tests on an Ubuntu 14.0.2 VM and GHC
>>>>>>> 8.0.2, and
>>>>>>> had the
>>>>>>>         following
>>>>>>>                 results:
>>>>>>>
>>>>>>>                 - sorting of random lists of integers is 27% faster
>>>>>>>                 - sorting of random lists of strings is 14% faster
>>>>>>>                 - sorting of already sorted lists is significantly
>>>>>>> slower,
>>>>>>> but still
>>>>>>>         much
>>>>>>>                 faster than sorting random lists
>>>>>>>                 - proposed version is more laziness friendly. For
>>>>>>> example
>>>>>>> this
>>>>>>>         version of
>>>>>>>                 sortBy requires 11 comparisons to find
>>>>>>>                   the smallest element of a 15 element list, while
>>>>>>> the
>>>>>>> default
>>>>>>>                 Data.List.sortBy requires 15 comparisons.
>>>>>>>                   (see
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_wi
>>>>>>> th_trace.hs
>>>>>>> <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w
>>>>>>> ith_trace.hs>
>>>>>>>
>>>>>>> <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w
>>>>>>> ith_trace.hs
>>>>>>> <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w
>>>>>>> ith_trace.hs> >
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                 Test results
>>>>>>>                 ------------
>>>>>>>
>>>>>>>                 Criterion output (descending/ascending results are
>>>>>>> for
>>>>>>> already
>>>>>>>         sorted
>>>>>>>                 lists).
>>>>>>>                 I barely understand what Criterion does, and I am
>>>>>>> puzzled
>>>>>>> with the
>>>>>>>         various
>>>>>>>                 "T" output - maybe there is a bug in my bench code:
>>>>>>>
>>>>>>>                 vagrant at vagrant-ubuntu-trusty-64:/vagrant$ stack
>>>>>>> exec
>>>>>>> ghc-sort
>>>>>>>                 benchmarking ascending ints/ghc
>>>>>>>                 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime
>>>>>>> 160.6 ms
>>>>>>>         (153.4
>>>>>>>                 ms .. 167.8 ms)
>>>>>>>                                      0.997 R²   (0.986 R² .. 1.000
>>>>>>> R²)
>>>>>>>                 mean                 161.7 ms   (158.3 ms .. 165.9
>>>>>>> ms)
>>>>>>>                 std dev              5.210 ms   (3.193 ms .. 7.006
>>>>>>> ms)
>>>>>>>                 variance introduced by outliers: 12% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>>                 benchmarking ascending ints/greg
>>>>>>>                 TTTTTTTTTTTTTTTTtime                 473.8 ms
>>>>>>>  (398.6 ms ..
>>>>>>> 554.9
>>>>>>>         ms)
>>>>>>>                                      0.996 R²   (0.987 R² .. 1.000
>>>>>>> R²)
>>>>>>>                 mean                 466.2 ms   (449.0 ms .. 475.0
>>>>>>> ms)
>>>>>>>                 std dev              14.94 ms   (0.0 s .. 15.29 ms)
>>>>>>>                 variance introduced by outliers: 19% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>>                 benchmarking descending ints/ghc
>>>>>>>                 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime
>>>>>>> 165.1 ms
>>>>>>>         (148.2
>>>>>>>                 ms .. 178.2 ms)
>>>>>>>                                      0.991 R²   (0.957 R² .. 1.000
>>>>>>> R²)
>>>>>>>                 mean                 158.7 ms   (154.0 ms .. 164.3
>>>>>>> ms)
>>>>>>>                 std dev              7.075 ms   (4.152 ms .. 9.903
>>>>>>> ms)
>>>>>>>                 variance introduced by outliers: 12% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>>                 benchmarking descending ints/greg
>>>>>>>                 TTTTTTTTTTTTTTTTtime                 471.7 ms
>>>>>>>  (419.8 ms ..
>>>>>>> 508.3
>>>>>>>         ms)
>>>>>>>                                      0.999 R²   (0.995 R² .. 1.000
>>>>>>> R²)
>>>>>>>                 mean                 476.0 ms   (467.5 ms .. 480.0
>>>>>>> ms)
>>>>>>>                 std dev              7.447 ms   (67.99 as .. 7.865
>>>>>>> ms)
>>>>>>>                 variance introduced by outliers: 19% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>>                 benchmarking random ints/ghc
>>>>>>>                 TTTTTTTTTTTTTTTTtime                 2.852 s
>>>>>>> (2.564 s ..
>>>>>>> 3.019 s)
>>>>>>>                                      0.999 R²   (0.997 R² .. 1.000
>>>>>>> R²)
>>>>>>>                 mean                 2.812 s    (2.785 s .. 2.838 s)
>>>>>>>                 std dev              44.06 ms   (543.9 as .. 44.97
>>>>>>> ms)
>>>>>>>                 variance introduced by outliers: 19% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>>                 benchmarking random ints/greg
>>>>>>>                 TTTTTTTTTTTTTTTTtime                 2.032 s
>>>>>>> (1.993 s ..
>>>>>>> 2.076 s)
>>>>>>>                                      1.000 R²   (1.000 R² .. 1.000
>>>>>>> R²)
>>>>>>>                 mean                 2.028 s    (2.019 s .. 2.033 s)
>>>>>>>                 std dev              7.832 ms   (0.0 s .. 8.178 ms)
>>>>>>>                 variance introduced by outliers: 19% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>>                 benchmarking shakespeare/ghc
>>>>>>>                 TTTTTTTTTTTTTTTTtime                 6.504 s
>>>>>>> (6.391 s ..
>>>>>>> 6.694 s)
>>>>>>>                                      1.000 R²   (1.000 R² .. 1.000
>>>>>>> R²)
>>>>>>>                 mean                 6.499 s    (6.468 s .. 6.518 s)
>>>>>>>                 std dev              28.85 ms   (0.0 s .. 32.62 ms)
>>>>>>>                 variance introduced by outliers: 19% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>>                 benchmarking shakespeare/greg
>>>>>>>                 TTTTTTTTTTTTTTTTtime                 5.560 s
>>>>>>> (5.307 s ..
>>>>>>> 5.763 s)
>>>>>>>                                      1.000 R²   (0.999 R² .. 1.000
>>>>>>> R²)
>>>>>>>                 mean                 5.582 s    (5.537 s .. 5.607 s)
>>>>>>>                 std dev              39.30 ms   (0.0 s .. 43.49 ms)
>>>>>>>                 variance introduced by outliers: 19% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>>
>>>>>>>                 Costs and Drawbacks
>>>>>>>                 -------------------
>>>>>>>
>>>>>>>                 The only cost I see is the reduced performance when
>>>>>>> sorting
>>>>>>> already
>>>>>>>         sorted
>>>>>>>                 lists. However, since this remains quite efficient,
>>>>>>> indeed
>>>>>>> over 4
>>>>>>>         times
>>>>>>>                 faster than sorting unsorted lists, I think it is an
>>>>>>> acceptable
>>>>>>>         tradeoff.
>>>>>>>
>>>>>>>                 Final note
>>>>>>>                 ----------
>>>>>>>
>>>>>>>                 My Haskell is very rusty. I worked on this a couple
>>>>>>> years
>>>>>>> ago when I
>>>>>>>         was
>>>>>>>                 learning Haskell, and meant to propose it to the
>>>>>>> Haskell
>>>>>>> community,
>>>>>>>         but
>>>>>>>                 never got to it at the time.
>>>>>>>
>>>>>>>                 _______________________________________________
>>>>>>>                 Libraries mailing list
>>>>>>>                 Libraries at haskell.org
>>>>>>>                 http://mail.haskell.org/cgi-bi
>>>>>>> n/mailman/listinfo/libraries
>>>>>>> <http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries>
>>>>>>>
>>>>>>>         <http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>>>>>> <http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Libraries mailing list
>>>> Libraries at haskell.org
>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>>>
>>>>
>>> _______________________________________________
>>> Libraries mailing list
>>> Libraries at haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>>
>>>
>>
>> _______________________________________________
>> Libraries mailing list
>> Libraries at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20170328/975de3cb/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook.jpg
Type: image/jpeg
Size: 105044 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20170328/975de3cb/attachment-0001.jpg>


More information about the Libraries mailing list