fix for Data.List.sortBy

Siddhanathan Shanmugam siddhanathan+eml at gmail.com
Tue Mar 28 06:44:42 UTC 2017


Turns out we don't need seq at all. A simple refactoring of the merge
function does the trick equally well.

    mergePairs (a:b:xs) = merge id a b : mergePairs xs
    mergePairs xs       = xs

    merge f as@(a:as') bs@(b:bs')
      | a `cmp` b == GT = merge (f.(b:)) as  bs'
      | otherwise       = merge (f.(a:)) as' bs
    merge f [] bs       = f bs
    merge f as []       = f as

This variant is 10% faster in my tests.


On Mon, Mar 27, 2017 at 5:49 PM, Dan Burton <danburton.email at gmail.com>
wrote:

> Does this rely on Common Subexpression Elimination optimization in order
> to work? Would it work more reliably if the `seq`-ed expression were
> let-bound?
>

I don't think it relies heavily on CSE. The seq's are there to avoid a
cascading series of thunk evaluations. Using let expressions doesn't seem
to affect the benchmarks.


>
> -- Dan Burton
>
> On Mon, Mar 27, 2017 at 5:41 PM, David Feuer <david.feuer at gmail.com>
> wrote:
>
>> The first seq is useless: constructor application is never suspended. I
>> haven't had a chance to look at the rest yet.
>>
>> On Mar 27, 2017 7:59 PM, "Gregory Popovitch" <greg7mdp at gmail.com> wrote:
>>
>>> Sid,
>>>
>>> I'd be delighted to submit the patch, as long as I have permission
>>> (which I probably don't), you feel confident about the change and maybe a
>>> couple of other people agree.
>>>
>>> Here is the proposed change. Tests shows significant speed improvement
>>> (30%) when sorting lists of random numbers, and same efficiency for sorting
>>> already sorted lists (both ascending and descending).
>>>
>>>
>>> Thanks,
>>>
>>> greg
>>>
>>> ------------------------------
>>> *From:* siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] *On
>>> Behalf Of *Siddhanathan Shanmugam
>>> *Sent:* Monday, March 27, 2017 6:53 PM
>>> *To:* Gregory Popovitch
>>> *Subject:* RE: Proposal: a new implementation for Data.List.sort and
>>> Data.List.sortBy, which has better performance characteristics and is more
>>> laziness-friendly.
>>>
>>> Since I don't see any regressions, this doesn't really need CLC
>>> approval. The changes are also small enough that a Github PR may be
>>> accepted (otherwise, the change goes in via Phabricator).
>>>
>>> Are you interested in implementing this patch? If yes, a standard Github
>>> PR should be fine. Right now gSort is a three line change I think. It will
>>> be changed in ghc/libraries/base/Data/OldList.hs on the ghc/ghc repo on
>>> Github.
>>>
>>> I'm hoping for some more comments from other Haskellers, before pushing
>>> for this change in base. I feel like we may be missing a potential
>>> optimization that someone else might spot. So probably going to wait a few
>>> days.
>>>
>>>
>>> On Mar 27, 2017 11:43 AM, "Gregory Popovitch" <greg7mdp at gmail.com>
>>> wrote:
>>>
>>> Hi Sid,
>>>
>>> Thanks, glad you looked into that. My understanding of the Haskell
>>> execution model is really poor, so I can't say one way or the other, but I
>>> felt that laziness ought to be considered as well, and I'm glad it was :-)
>>>
>>> So  in conclusion it looks like we have a winner with your latest
>>> gSortBy version. How do we get this pushed to the GHC library?
>>>
>>> Thanks,
>>>
>>> greg
>>>
>>> ------------------------------
>>> *From:* siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] *On
>>> Behalf Of *Siddhanathan Shanmugam
>>> *Sent:* Monday, March 27, 2017 2:12 PM
>>> *To:* Gregory Popovitch
>>>
>>> *Subject:* Re: Proposal: a new implementation for Data.List.sort and
>>> Data.List.sortBy, which has better performance characteristics and is more
>>> laziness-friendly.
>>>
>>> Hi Greg,
>>>
>>> On Mon, Mar 27, 2017 at 10:19 AM, Gregory Popovitch <greg7mdp at gmail.com>
>>> wrote:
>>>
>>>> Unfortunately, this optimization makes the sort less lazy, so doing
>>>> something like:
>>>>
>>>> take 4 $ sort l
>>>>
>>>> requires more sorting of the list l with this change. I'm not sure it
>>>> is a good tradeoff.
>>>>
>>>> This can be verified with: https://github.com/greg7mdp/gh
>>>> c-sort/blob/master/src/sort_with_trace.hs
>>>>
>>>
>>> I think you're running without optimizations turned on. It is lazy in my
>>> case.
>>>
>>> Also, the difference should be negligible (if any at all). Here's an
>>> example of the list being sorted:
>>>
>>> [11,4,6,8,2,5,1,7,9,55,11,3]
>>> ...
>>> [[4,11],[6,8],[2,5],[1,7,9,55],[3,11],[]]
>>> ...
>>> [[1,2,4,5,6,7,8,9,11,55],[3,11]]
>>>  * 1 3
>>>  * 2 3
>>>  * 4 3
>>>  * 4 11
>>> [1,2,3,4]
>>>
>>> The number of operations saved is only in the last merge. It's only lazy
>>> at this step.
>>>
>>> So we save at most one traversal of the list, which is not too expensive
>>> since our worst case bounds is O(n log n) anyway.
>>>
>>> This should mean that the asymptotic performance should be identical,
>>> regardless of the number of comparisons saved. Of course, you do get better
>>> constants, but I would be surprised if those constants translated to
>>> significantly better performance for a reasonable size list.
>>>
>>>
>>>>
>>>>  I do agree that it would be nice to have a more serious validation
>>>> test suite.
>>>>
>>>> Thanks,
>>>>
>>>> greg
>>>>
>>>> ------------------------------
>>>> *From:* siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] *On
>>>> Behalf Of *Siddhanathan Shanmugam
>>>> *Sent:* Monday, March 27, 2017 12:53 PM
>>>>
>>>> *To:* Gregory Popovitch
>>>> *Cc:* Haskell Libraries
>>>> *Subject:* Re: Proposal: a new implementation for Data.List.sort and
>>>> Data.List.sortBy, which has better performance characteristics and is more
>>>> laziness-friendly.
>>>>
>>>> We can improve things a bit further by forcing evaluation (with seq)
>>>> along the way appropriately.
>>>>
>>>>
>>>>
>>>> gregSortBy cmp [] = []
>>>> gregSortBy cmp xs = head $ until (null.tail) reduce (pair xs)
>>>>   where
>>>>     pair (x:y:t) | x `cmp` y == GT  = [y, x] : pair t
>>>>                  | otherwise        = [x, y] : pair t
>>>>     pair [x] = [[x]]
>>>>     pair []  = []
>>>>
>>>>     reduce (v:w:x:y:t) = merge v' x' `seq` merge v' x' : reduce t
>>>>                          where v' = merge v w `seq` merge v w
>>>>                                x' = merge x y `seq` merge x y
>>>>
>>>>     reduce (x:y:t) = merge x y `seq` merge x y : reduce t
>>>>     reduce xs      = xs
>>>>
>>>>     merge xs []           = xs
>>>>     merge []  ys          = ys
>>>>     merge xs@(x:xs') ys@(y:ys')
>>>>          | x `cmp` y == GT  = y : merge xs  ys'
>>>>          | otherwise        = x : merge xs' ys
>>>>
>>>>
>>>> gSortBy cmp = mergeAll . sequences
>>>>   where
>>>>     sequences (a:b:xs)
>>>>       | a `cmp` b == GT = descending b [a]  xs
>>>>       | otherwise       = ascending  b (a:) xs
>>>>     sequences xs = [xs]
>>>>
>>>>
>>>>     descending a as (b:bs)
>>>>       | a `cmp` b == GT = descending b (a:as) bs
>>>>     descending a as bs  = (a:as) `seq` (a:as) : sequences bs
>>>>
>>>>
>>>>     ascending a as (b:bs)
>>>>       | a `cmp` b /= GT = ascending b (as . (a:)) bs
>>>>     ascending a as bs   = as [a] `seq` as [a] : sequences bs
>>>>
>>>>
>>>>     mergeAll [x] = x
>>>>     mergeAll xs  = mergeAll (mergePairs xs)
>>>>
>>>>
>>>>     mergePairs (a:b:xs) = merge a b `seq` merge a b : mergePairs xs
>>>>     mergePairs xs       = xs
>>>>
>>>>
>>>>     merge as@(a:as') bs@(b:bs')
>>>>       | a `cmp` b == GT = b : merge as  bs'
>>>>       | otherwise       = a : merge as' bs
>>>>     merge [] bs         = bs
>>>>     merge as []         = as
>>>>
>>>>
>>>>
>>>>
>>>> *Before the change:*
>>>>
>>>> benchmarking random ints/ghc
>>>> time                 3.687 s    (3.541 s .. NaN s)
>>>>                      1.000 R²   (1.000 R² .. 1.000 R²)
>>>> mean                 3.691 s    (3.669 s .. 3.705 s)
>>>> std dev              21.45 ms   (0.0 s .. 24.76 ms)
>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>
>>>> benchmarking random ints/greg
>>>> time                 2.648 s    (2.482 s .. 2.822 s)
>>>>                      0.999 R²   (0.998 R² .. 1.000 R²)
>>>> mean                 2.704 s    (2.670 s .. 2.736 s)
>>>> std dev              52.68 ms   (0.0 s .. 54.49 ms)
>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>
>>>> benchmarking random ints/gSort
>>>> time                 2.733 s    (2.682 s .. 2.758 s)
>>>>                      1.000 R²   (1.000 R² .. 1.000 R²)
>>>> mean                 2.707 s    (2.689 s .. 2.718 s)
>>>> std dev              16.84 ms   (0.0 s .. 19.20 ms)
>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>
>>>> *After the change:*
>>>>
>>>> benchmarking random ints/greg
>>>> time                 2.576 s    (2.548 s .. 2.628 s)
>>>>                      1.000 R²   (1.000 R² .. 1.000 R²)
>>>> mean                 2.590 s    (2.578 s .. 2.599 s)
>>>> std dev              12.99 ms   (0.0 s .. 14.89 ms)
>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>
>>>> benchmarking random ints/gSort
>>>> time                 2.538 s    (2.412 s .. 2.627 s)
>>>>                      1.000 R²   (0.999 R² .. 1.000 R²)
>>>> mean                 2.543 s    (2.517 s .. 2.560 s)
>>>> std dev              26.16 ms   (0.0 s .. 30.21 ms)
>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Mar 26, 2017 at 1:54 PM, Siddhanathan Shanmugam <
>>>> siddhanathan+eml at gmail.com> wrote:
>>>>
>>>>> Theoretically, we could do better. We currently only exploit monotonic
>>>>> runs in merge sort, but we could also exploit bitonic runs:
>>>>>
>>>>>     dlist as = as [] `seq` as []
>>>>>
>>>>>     sequences [] = [[]]
>>>>>     sequences [a] = [[a]]
>>>>>     sequences (a:xs) = bitonic a a (a:) xs
>>>>>
>>>>>     bitonic min max as (b:bs)
>>>>>       | b `cmp` max /= LT = bitonic min b   (as . (b:)) bs
>>>>>       | b `cmp` min /= GT = bitonic b   max ((b:) . as) bs
>>>>>       | otherwise = dlist as : sequences (b:bs)
>>>>>     bitonic _ _ as [] = [dlist as]
>>>>>
>>>>>
>>>>> The constant factors here might be too high to notice the difference
>>>>> though.
>>>>>
>>>>>
>>>>> > However, still my version is more laziness-friendly, i.e. it
>>>>> requires fewer
>>>>> > comparisons to get the
>>>>> > N smallest elements of a list (see
>>>>> > https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_
>>>>> with_trace.hs).
>>>>> >
>>>>> > I wonder if this might not be a more useful trait than being able to
>>>>> sort
>>>>> > already sorted lists super fast.
>>>>>
>>>>> This comes down to a discussion of merge sort vs natural merge sort.
>>>>>
>>>>> Data.List.sort is an implementation of a variant of merge sort called
>>>>> natural merge sort. The algorithm is linearithmic in the worst case, but
>>>>> linear in the best case (already sorted list).
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Mar 26, 2017 at 10:47 AM, Gregory Popovitch <
>>>>> greg7mdp at gmail.com> wrote:
>>>>>
>>>>>> Thanks again @Siddhanathan! Looks like your gSort fixes the main
>>>>>> issue with
>>>>>> Data.List.sort().
>>>>>>
>>>>>> I have updated the test programs in https://github.com/greg7mdp/gh
>>>>>> c-sort to
>>>>>> include your new version.
>>>>>>
>>>>>> Here are the results (your new version looks like a definite
>>>>>> improvement vs
>>>>>> the current GHC one):
>>>>>>
>>>>>> input                        GHC sort         My Orig proposal
>>>>>>  gSort
>>>>>> ------------------------------------------------------------
>>>>>> ----------------
>>>>>> ---
>>>>>> sorted ints (ascending)      151               456
>>>>>> 148
>>>>>> sorted ints (descending)     152               466
>>>>>> 155
>>>>>> random ints                 2732              2006
>>>>>>  2004
>>>>>> random strings              6564              5549
>>>>>>  5528
>>>>>>
>>>>>>
>>>>>> So replacing the current GHC version with gSort is a no brainer, as
>>>>>> it is
>>>>>> better in all regards.
>>>>>>
>>>>>> However, still my version is more laziness-friendly, i.e. it requires
>>>>>> fewer
>>>>>> comparisons to get the
>>>>>> N smallest elements of a list (see
>>>>>> https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_wi
>>>>>> th_trace.hs).
>>>>>>
>>>>>> I wonder if this might not be a more useful trait than being able to
>>>>>> sort
>>>>>> already sorted lists super fast.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> greg
>>>>>>
>>>>>> ________________________________
>>>>>>
>>>>>> From: siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] On
>>>>>> Behalf Of
>>>>>> Siddhanathan Shanmugam
>>>>>> Sent: Sunday, March 26, 2017 1:05 PM
>>>>>> To: Gregory Popovitch
>>>>>> Cc: Haskell Libraries
>>>>>> Subject: Re: Proposal: a new implementation for Data.List.sort and
>>>>>> Data.List.sortBy, which has better performance characteristics and is
>>>>>> more
>>>>>> laziness-friendly.
>>>>>>
>>>>>>
>>>>>> Interesting. You are right, performance for sorting random lists has
>>>>>> priority over performance for sorting already-sorted lists.
>>>>>>
>>>>>> Ignore the numbers for my previous version. Can you compare GHC's
>>>>>> sort, your
>>>>>> proposal, and gSort below?
>>>>>>
>>>>>>
>>>>>> gSort :: Ord a => [a] -> [a]
>>>>>> gSort = gSortBy compare
>>>>>> gSortBy cmp = mergeAll . sequences
>>>>>>   where
>>>>>>     sequences (a:b:xs)
>>>>>>       | a `cmp` b == GT = descending b [a]  xs
>>>>>>       | otherwise       = ascending  b (a:) xs
>>>>>>     sequences xs = [xs]
>>>>>>
>>>>>>
>>>>>>     descending a as (b:bs)
>>>>>>       | a `cmp` b == GT = descending b (a:as) bs
>>>>>>     descending a as bs  = (a:as) : sequences bs
>>>>>>
>>>>>>
>>>>>>     ascending a as (b:bs)
>>>>>>       | a `cmp` b /= GT = ascending b (\ys -> as (a:ys)) bs
>>>>>>     ascending a as bs   = as [a] `seq` as [a] : sequences bs
>>>>>>
>>>>>>
>>>>>>     mergeAll [x] = x
>>>>>>     mergeAll xs  = mergeAll (mergePairs xs)
>>>>>>
>>>>>>
>>>>>>     mergePairs (a:b:xs) = merge a b : mergePairs xs
>>>>>>     mergePairs xs       = xs
>>>>>>
>>>>>>
>>>>>>     merge as@(a:as') bs@(b:bs')
>>>>>>       | a `cmp` b == GT = b : merge as  bs'
>>>>>>       | otherwise       = a : merge as' bs
>>>>>>     merge [] bs         = bs
>>>>>>     merge as []         = as
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Sid
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 26, 2017 at 9:19 AM, Gregory Popovitch <
>>>>>> greg7mdp at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>         Thank you @Siddhanathan! I welcome any improvement you may
>>>>>> make, as
>>>>>> I said I
>>>>>>         am very far from a Haskell expert.
>>>>>>
>>>>>>         I just tested your change with my test project
>>>>>>         (https://github.com/greg7mdp/ghc-sort
>>>>>> <https://github.com/greg7mdp/ghc-sort> )
>>>>>>         and here are my results (mean times in ms):
>>>>>>
>>>>>>         input                        GHC sort          Orig proposal
>>>>>> your
>>>>>>         change
>>>>>>
>>>>>> ------------------------------------------------------------
>>>>>> ----------------
>>>>>>         ---
>>>>>>         sorted ints (ascending)      153               467
>>>>>> 139
>>>>>>         sorted ints (descending)     152               472
>>>>>> 599
>>>>>>         random ints                 2824              2077
>>>>>> 2126
>>>>>>         random strings              6564              5613
>>>>>> 5983
>>>>>>
>>>>>>         Your change is a definite improvement for sorted integers in
>>>>>> ascending
>>>>>>         order, but is worse for other cases.
>>>>>>
>>>>>>         Is there a real need to optimize the sort for already sorted
>>>>>> list?
>>>>>> Of course
>>>>>>         it should not be a degenerate
>>>>>>         case and take longer than sorting random numbers, but this is
>>>>>> not
>>>>>> the case
>>>>>>         here. Sorting already sorted
>>>>>>         lists is, even with my version, over 4 times faster than
>>>>>> sorting
>>>>>> random
>>>>>>         lists. This sounds perfectly
>>>>>>         acceptable to me, and I feel that trying to optimize this
>>>>>> specific
>>>>>> case
>>>>>>         further, if it comes at the
>>>>>>         detriment of the general case, is not desirable.
>>>>>>
>>>>>>         Thanks,
>>>>>>
>>>>>>         greg
>>>>>>
>>>>>>         ________________________________
>>>>>>
>>>>>>         From: siddhanathan at gmail.com [mailto:siddhanathan at gmail.com]
>>>>>> On
>>>>>> Behalf Of
>>>>>>         Siddhanathan Shanmugam
>>>>>>         Sent: Sunday, March 26, 2017 11:41 AM
>>>>>>         To: Gregory Popovitch
>>>>>>         Cc: Haskell Libraries
>>>>>>         Subject: Re: Proposal: a new implementation for
>>>>>> Data.List.sort and
>>>>>>         Data.List.sortBy, which has better performance
>>>>>> characteristics and
>>>>>> is more
>>>>>>         laziness-friendly.
>>>>>>
>>>>>>
>>>>>>
>>>>>>         Thank you! This identifies a space leak in base which went
>>>>>> unnoticed
>>>>>> for 7
>>>>>>         years.
>>>>>>
>>>>>>         Your implementation can be improved further. Instead of
>>>>>> splitting
>>>>>> into
>>>>>>         pairs, you could instead split into lists of sorted sublists
>>>>>> by
>>>>>> replacing
>>>>>>         the pairs function with the following
>>>>>>
>>>>>>             pair = foldr f []
>>>>>>               where
>>>>>>                 f x [] = [[x]]
>>>>>>                 f x (y:ys)
>>>>>>                   | x `cmp` head y == LT = (x:y):ys
>>>>>>                   | otherwise            = [x]:y:ys
>>>>>>
>>>>>>         This should give you the same performance improvements for
>>>>>> sorting
>>>>>> random
>>>>>>         lists, but better performance while sorting ascending lists.
>>>>>>
>>>>>>         The version in base takes it one step further by using a
>>>>>> DList to
>>>>>> handle the
>>>>>>         descending case efficiently as well, except there's a space
>>>>>> leak
>>>>>> right now
>>>>>>         because of which it is slower.
>>>>>>
>>>>>>         On Sun, Mar 26, 2017 at 7:21 AM, Gregory Popovitch
>>>>>> <greg7mdp at gmail.com>
>>>>>>         wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>                 Motivation:
>>>>>>                 ----------
>>>>>>
>>>>>>                 Data.List.sort is a very important functionality in
>>>>>> Haskell.
>>>>>> I
>>>>>>         believe that
>>>>>>                 the proposed implementation is:
>>>>>>
>>>>>>                 - significantly faster than the current
>>>>>> implementation on
>>>>>> unsorted
>>>>>>         lists,
>>>>>>                 typically 14% to 27% faster
>>>>>>                 - more laziness-friendly, i.e.:
>>>>>>                     take 3 $ sort l
>>>>>>                   will require significantly less comparisons than the
>>>>>> current
>>>>>>                 implementation
>>>>>>
>>>>>>                 Proposed Implementation
>>>>>>                 -----------------------
>>>>>>
>>>>>>                 sort :: (Ord a) => [a] -> [a]
>>>>>>                 sort =  sortBy compare
>>>>>>
>>>>>>                 sortBy cmp [] = []
>>>>>>                 sortBy cmp xs = head $ until (null.tail) reduce (pair
>>>>>> xs)
>>>>>>                   where
>>>>>>                     pair (x:y:t) | x `cmp` y == GT  = [y, x] : pair t
>>>>>>                                  | otherwise        = [x, y] : pair t
>>>>>>                     pair [x] = [[x]]
>>>>>>                     pair []  = []
>>>>>>
>>>>>>                     reduce (v:w:x:y:t) = merge v' x' : reduce t
>>>>>>                                          where v' = merge v w
>>>>>>                                                x' = merge x y
>>>>>>
>>>>>>                     reduce (x:y:t) = merge x y : reduce t
>>>>>>                     reduce xs      = xs
>>>>>>
>>>>>>                     merge xs []           = xs
>>>>>>                     merge []  ys          = ys
>>>>>>                     merge xs@(x:xs') ys@(y:ys')
>>>>>>                          | x `cmp` y == GT  = y : merge xs  ys'
>>>>>>                          | otherwise        = x : merge xs' ys
>>>>>>
>>>>>>
>>>>>>                 Effect and Interactions
>>>>>>                 -----------------------
>>>>>>
>>>>>>                 I have a stack project with a criterion test for this
>>>>>> new
>>>>>>         implementation,
>>>>>>                 available at https://github.com/greg7mdp/ghc-sort
>>>>>> <https://github.com/greg7mdp/ghc-sort>
>>>>>>
>>>>>>         <https://github.com/greg7mdp/ghc-sort
>>>>>> <https://github.com/greg7mdp/ghc-sort> > .
>>>>>>                 I ran the tests on an Ubuntu 14.0.2 VM and GHC 8.0.2,
>>>>>> and
>>>>>> had the
>>>>>>         following
>>>>>>                 results:
>>>>>>
>>>>>>                 - sorting of random lists of integers is 27% faster
>>>>>>                 - sorting of random lists of strings is 14% faster
>>>>>>                 - sorting of already sorted lists is significantly
>>>>>> slower,
>>>>>> but still
>>>>>>         much
>>>>>>                 faster than sorting random lists
>>>>>>                 - proposed version is more laziness friendly. For
>>>>>> example
>>>>>> this
>>>>>>         version of
>>>>>>                 sortBy requires 11 comparisons to find
>>>>>>                   the smallest element of a 15 element list, while the
>>>>>> default
>>>>>>                 Data.List.sortBy requires 15 comparisons.
>>>>>>                   (see
>>>>>>
>>>>>>
>>>>>> https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_wi
>>>>>> th_trace.hs
>>>>>> <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w
>>>>>> ith_trace.hs>
>>>>>>
>>>>>> <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w
>>>>>> ith_trace.hs
>>>>>> <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w
>>>>>> ith_trace.hs> >
>>>>>> )
>>>>>>
>>>>>>
>>>>>>
>>>>>>                 Test results
>>>>>>                 ------------
>>>>>>
>>>>>>                 Criterion output (descending/ascending results are for
>>>>>> already
>>>>>>         sorted
>>>>>>                 lists).
>>>>>>                 I barely understand what Criterion does, and I am
>>>>>> puzzled
>>>>>> with the
>>>>>>         various
>>>>>>                 "T" output - maybe there is a bug in my bench code:
>>>>>>
>>>>>>                 vagrant at vagrant-ubuntu-trusty-64:/vagrant$ stack exec
>>>>>> ghc-sort
>>>>>>                 benchmarking ascending ints/ghc
>>>>>>                 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime
>>>>>> 160.6 ms
>>>>>>         (153.4
>>>>>>                 ms .. 167.8 ms)
>>>>>>                                      0.997 R²   (0.986 R² .. 1.000 R²)
>>>>>>                 mean                 161.7 ms   (158.3 ms .. 165.9 ms)
>>>>>>                 std dev              5.210 ms   (3.193 ms .. 7.006 ms)
>>>>>>                 variance introduced by outliers: 12% (moderately
>>>>>> inflated)
>>>>>>
>>>>>>                 benchmarking ascending ints/greg
>>>>>>                 TTTTTTTTTTTTTTTTtime                 473.8 ms
>>>>>>  (398.6 ms ..
>>>>>> 554.9
>>>>>>         ms)
>>>>>>                                      0.996 R²   (0.987 R² .. 1.000 R²)
>>>>>>                 mean                 466.2 ms   (449.0 ms .. 475.0 ms)
>>>>>>                 std dev              14.94 ms   (0.0 s .. 15.29 ms)
>>>>>>                 variance introduced by outliers: 19% (moderately
>>>>>> inflated)
>>>>>>
>>>>>>                 benchmarking descending ints/ghc
>>>>>>                 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime
>>>>>> 165.1 ms
>>>>>>         (148.2
>>>>>>                 ms .. 178.2 ms)
>>>>>>                                      0.991 R²   (0.957 R² .. 1.000 R²)
>>>>>>                 mean                 158.7 ms   (154.0 ms .. 164.3 ms)
>>>>>>                 std dev              7.075 ms   (4.152 ms .. 9.903 ms)
>>>>>>                 variance introduced by outliers: 12% (moderately
>>>>>> inflated)
>>>>>>
>>>>>>                 benchmarking descending ints/greg
>>>>>>                 TTTTTTTTTTTTTTTTtime                 471.7 ms
>>>>>>  (419.8 ms ..
>>>>>> 508.3
>>>>>>         ms)
>>>>>>                                      0.999 R²   (0.995 R² .. 1.000 R²)
>>>>>>                 mean                 476.0 ms   (467.5 ms .. 480.0 ms)
>>>>>>                 std dev              7.447 ms   (67.99 as .. 7.865 ms)
>>>>>>                 variance introduced by outliers: 19% (moderately
>>>>>> inflated)
>>>>>>
>>>>>>                 benchmarking random ints/ghc
>>>>>>                 TTTTTTTTTTTTTTTTtime                 2.852 s
>>>>>> (2.564 s ..
>>>>>> 3.019 s)
>>>>>>                                      0.999 R²   (0.997 R² .. 1.000 R²)
>>>>>>                 mean                 2.812 s    (2.785 s .. 2.838 s)
>>>>>>                 std dev              44.06 ms   (543.9 as .. 44.97 ms)
>>>>>>                 variance introduced by outliers: 19% (moderately
>>>>>> inflated)
>>>>>>
>>>>>>                 benchmarking random ints/greg
>>>>>>                 TTTTTTTTTTTTTTTTtime                 2.032 s
>>>>>> (1.993 s ..
>>>>>> 2.076 s)
>>>>>>                                      1.000 R²   (1.000 R² .. 1.000 R²)
>>>>>>                 mean                 2.028 s    (2.019 s .. 2.033 s)
>>>>>>                 std dev              7.832 ms   (0.0 s .. 8.178 ms)
>>>>>>                 variance introduced by outliers: 19% (moderately
>>>>>> inflated)
>>>>>>
>>>>>>                 benchmarking shakespeare/ghc
>>>>>>                 TTTTTTTTTTTTTTTTtime                 6.504 s
>>>>>> (6.391 s ..
>>>>>> 6.694 s)
>>>>>>                                      1.000 R²   (1.000 R² .. 1.000 R²)
>>>>>>                 mean                 6.499 s    (6.468 s .. 6.518 s)
>>>>>>                 std dev              28.85 ms   (0.0 s .. 32.62 ms)
>>>>>>                 variance introduced by outliers: 19% (moderately
>>>>>> inflated)
>>>>>>
>>>>>>                 benchmarking shakespeare/greg
>>>>>>                 TTTTTTTTTTTTTTTTtime                 5.560 s
>>>>>> (5.307 s ..
>>>>>> 5.763 s)
>>>>>>                                      1.000 R²   (0.999 R² .. 1.000 R²)
>>>>>>                 mean                 5.582 s    (5.537 s .. 5.607 s)
>>>>>>                 std dev              39.30 ms   (0.0 s .. 43.49 ms)
>>>>>>                 variance introduced by outliers: 19% (moderately
>>>>>> inflated)
>>>>>>
>>>>>>
>>>>>>                 Costs and Drawbacks
>>>>>>                 -------------------
>>>>>>
>>>>>>                 The only cost I see is the reduced performance when
>>>>>> sorting
>>>>>> already
>>>>>>         sorted
>>>>>>                 lists. However, since this remains quite efficient,
>>>>>> indeed
>>>>>> over 4
>>>>>>         times
>>>>>>                 faster than sorting unsorted lists, I think it is an
>>>>>> acceptable
>>>>>>         tradeoff.
>>>>>>
>>>>>>                 Final note
>>>>>>                 ----------
>>>>>>
>>>>>>                 My Haskell is very rusty. I worked on this a couple
>>>>>> years
>>>>>> ago when I
>>>>>>         was
>>>>>>                 learning Haskell, and meant to propose it to the
>>>>>> Haskell
>>>>>> community,
>>>>>>         but
>>>>>>                 never got to it at the time.
>>>>>>
>>>>>>                 _______________________________________________
>>>>>>                 Libraries mailing list
>>>>>>                 Libraries at haskell.org
>>>>>>                 http://mail.haskell.org/cgi-bi
>>>>>> n/mailman/listinfo/libraries
>>>>>> <http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries>
>>>>>>
>>>>>>         <http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>>>>> <http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries> >
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Libraries mailing list
>>> Libraries at haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>>
>>>
>> _______________________________________________
>> Libraries mailing list
>> Libraries at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>
>>
>
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20170327/00724dd1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook.jpg
Type: image/jpeg
Size: 113533 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20170327/00724dd1/attachment-0001.jpg>


More information about the Libraries mailing list