fix for Data.List.sortBy
Siddhanathan Shanmugam
siddhanathan+eml at gmail.com
Tue Mar 28 17:42:09 UTC 2017
On Tue, Mar 28, 2017 at 5:03 AM, Gregory Popovitch <greg7mdp at gmail.com>
wrote:
> Sid, this new version (diff below) is really fast when sorting random
> ints, but slower when sorting strings:
>
Ok, let's not use that then. Also, add an inline pragma on the function
merge.
{-# INLINE merge #-}
merge as@(a:as') bs@(b:bs')
>
> input GHC sort Orig proposal gSort
> -------------------------------------------------------------------------
> sorted ints (ascending) 151 460 147
> sorted ints (descending) 151 467 171
> random ints 2771 2010 1365
> random strings 6542 5524 5991
>
>
>
> Thanks,
>
> greg
>
> ------------------------------
> *From:* siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] *On Behalf
> Of *Siddhanathan Shanmugam
> *Sent:* Tuesday, March 28, 2017 2:45 AM
> *To:* Dan Burton
> *Cc:* Haskell Libraries; Gregory Popovitch
> *Subject:* Re: fix for Data.List.sortBy
>
> Turns out we don't need seq at all. A simple refactoring of the merge
> function does the trick equally well.
>
> mergePairs (a:b:xs) = merge id a b : mergePairs xs
> mergePairs xs = xs
>
> merge f as@(a:as') bs@(b:bs')
> | a `cmp` b == GT = merge (f.(b:)) as bs'
> | otherwise = merge (f.(a:)) as' bs
> merge f [] bs = f bs
> merge f as [] = f as
>
> This variant is 10% faster in my tests.
>
>
> On Mon, Mar 27, 2017 at 5:49 PM, Dan Burton <danburton.email at gmail.com>
> wrote:
>
>> Does this rely on Common Subexpression Elimination optimization in order
>> to work? Would it work more reliably if the `seq`-ed expression were
>> let-bound?
>>
>
> I don't think it relies heavily on CSE. The seq's are there to avoid a
> cascading series of thunk evaluations. Using let expressions doesn't seem
> to affect the benchmarks.
>
>
>>
>> -- Dan Burton
>>
>> On Mon, Mar 27, 2017 at 5:41 PM, David Feuer <david.feuer at gmail.com>
>> wrote:
>>
>>> The first seq is useless: constructor application is never suspended. I
>>> haven't had a chance to look at the rest yet.
>>>
>>> On Mar 27, 2017 7:59 PM, "Gregory Popovitch" <greg7mdp at gmail.com> wrote:
>>>
>>>> Sid,
>>>>
>>>> I'd be delighted to submit the patch, as long as I have permission
>>>> (which I probably don't), you feel confident about the change and maybe a
>>>> couple of other people agree.
>>>>
>>>> Here is the proposed change. Tests shows significant speed improvement
>>>> (30%) when sorting lists of random numbers, and same efficiency for sorting
>>>> already sorted lists (both ascending and descending).
>>>>
>>>> Thanks,
>>>>
>>>> greg
>>>>
>>>> ------------------------------
>>>> *From:* siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] *On
>>>> Behalf Of *Siddhanathan Shanmugam
>>>> *Sent:* Monday, March 27, 2017 6:53 PM
>>>> *To:* Gregory Popovitch
>>>> *Subject:* RE: Proposal: a new implementation for Data.List.sort and
>>>> Data.List.sortBy, which has better performance characteristics and is more
>>>> laziness-friendly.
>>>>
>>>> Since I don't see any regressions, this doesn't really need CLC
>>>> approval. The changes are also small enough that a Github PR may be
>>>> accepted (otherwise, the change goes in via Phabricator).
>>>>
>>>> Are you interested in implementing this patch? If yes, a standard
>>>> Github PR should be fine. Right now gSort is a three line change I think.
>>>> It will be changed in ghc/libraries/base/Data/OldList.hs on the
>>>> ghc/ghc repo on Github.
>>>>
>>>> I'm hoping for some more comments from other Haskellers, before pushing
>>>> for this change in base. I feel like we may be missing a potential
>>>> optimization that someone else might spot. So probably going to wait a few
>>>> days.
>>>>
>>>>
>>>> On Mar 27, 2017 11:43 AM, "Gregory Popovitch" <greg7mdp at gmail.com>
>>>> wrote:
>>>>
>>>> Hi Sid,
>>>>
>>>> Thanks, glad you looked into that. My understanding of the Haskell
>>>> execution model is really poor, so I can't say one way or the other, but I
>>>> felt that laziness ought to be considered as well, and I'm glad it was :-)
>>>>
>>>> So in conclusion it looks like we have a winner with your latest
>>>> gSortBy version. How do we get this pushed to the GHC library?
>>>>
>>>> Thanks,
>>>>
>>>> greg
>>>>
>>>> ------------------------------
>>>> *From:* siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] *On
>>>> Behalf Of *Siddhanathan Shanmugam
>>>> *Sent:* Monday, March 27, 2017 2:12 PM
>>>> *To:* Gregory Popovitch
>>>>
>>>> *Subject:* Re: Proposal: a new implementation for Data.List.sort and
>>>> Data.List.sortBy, which has better performance characteristics and is more
>>>> laziness-friendly.
>>>>
>>>> Hi Greg,
>>>>
>>>> On Mon, Mar 27, 2017 at 10:19 AM, Gregory Popovitch <greg7mdp at gmail.com
>>>> > wrote:
>>>>
>>>>> Unfortunately, this optimization makes the sort less lazy, so doing
>>>>> something like:
>>>>>
>>>>> take 4 $ sort l
>>>>>
>>>>> requires more sorting of the list l with this change. I'm not sure it
>>>>> is a good tradeoff.
>>>>>
>>>>> This can be verified with: https://github.com/greg7mdp/gh
>>>>> c-sort/blob/master/src/sort_with_trace.hs
>>>>>
>>>>
>>>> I think you're running without optimizations turned on. It is lazy in
>>>> my case.
>>>>
>>>> Also, the difference should be negligible (if any at all). Here's an
>>>> example of the list being sorted:
>>>>
>>>> [11,4,6,8,2,5,1,7,9,55,11,3]
>>>> ...
>>>> [[4,11],[6,8],[2,5],[1,7,9,55],[3,11],[]]
>>>> ...
>>>> [[1,2,4,5,6,7,8,9,11,55],[3,11]]
>>>> * 1 3
>>>> * 2 3
>>>> * 4 3
>>>> * 4 11
>>>> [1,2,3,4]
>>>>
>>>> The number of operations saved is only in the last merge. It's only
>>>> lazy at this step.
>>>>
>>>> So we save at most one traversal of the list, which is not too
>>>> expensive since our worst case bounds is O(n log n) anyway.
>>>>
>>>> This should mean that the asymptotic performance should be identical,
>>>> regardless of the number of comparisons saved. Of course, you do get better
>>>> constants, but I would be surprised if those constants translated to
>>>> significantly better performance for a reasonable size list.
>>>>
>>>>
>>>>>
>>>>> I do agree that it would be nice to have a more serious validation
>>>>> test suite.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> greg
>>>>>
>>>>> ------------------------------
>>>>> *From:* siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] *On
>>>>> Behalf Of *Siddhanathan Shanmugam
>>>>> *Sent:* Monday, March 27, 2017 12:53 PM
>>>>>
>>>>> *To:* Gregory Popovitch
>>>>> *Cc:* Haskell Libraries
>>>>> *Subject:* Re: Proposal: a new implementation for Data.List.sort and
>>>>> Data.List.sortBy, which has better performance characteristics and is more
>>>>> laziness-friendly.
>>>>>
>>>>> We can improve things a bit further by forcing evaluation (with seq)
>>>>> along the way appropriately.
>>>>>
>>>>>
>>>>>
>>>>> gregSortBy cmp [] = []
>>>>> gregSortBy cmp xs = head $ until (null.tail) reduce (pair xs)
>>>>> where
>>>>> pair (x:y:t) | x `cmp` y == GT = [y, x] : pair t
>>>>> | otherwise = [x, y] : pair t
>>>>> pair [x] = [[x]]
>>>>> pair [] = []
>>>>>
>>>>> reduce (v:w:x:y:t) = merge v' x' `seq` merge v' x' : reduce t
>>>>> where v' = merge v w `seq` merge v w
>>>>> x' = merge x y `seq` merge x y
>>>>>
>>>>> reduce (x:y:t) = merge x y `seq` merge x y : reduce t
>>>>> reduce xs = xs
>>>>>
>>>>> merge xs [] = xs
>>>>> merge [] ys = ys
>>>>> merge xs@(x:xs') ys@(y:ys')
>>>>> | x `cmp` y == GT = y : merge xs ys'
>>>>> | otherwise = x : merge xs' ys
>>>>>
>>>>>
>>>>> gSortBy cmp = mergeAll . sequences
>>>>> where
>>>>> sequences (a:b:xs)
>>>>> | a `cmp` b == GT = descending b [a] xs
>>>>> | otherwise = ascending b (a:) xs
>>>>> sequences xs = [xs]
>>>>>
>>>>>
>>>>> descending a as (b:bs)
>>>>> | a `cmp` b == GT = descending b (a:as) bs
>>>>> descending a as bs = (a:as) `seq` (a:as) : sequences bs
>>>>>
>>>>>
>>>>> ascending a as (b:bs)
>>>>> | a `cmp` b /= GT = ascending b (as . (a:)) bs
>>>>> ascending a as bs = as [a] `seq` as [a] : sequences bs
>>>>>
>>>>>
>>>>> mergeAll [x] = x
>>>>> mergeAll xs = mergeAll (mergePairs xs)
>>>>>
>>>>>
>>>>> mergePairs (a:b:xs) = merge a b `seq` merge a b : mergePairs xs
>>>>> mergePairs xs = xs
>>>>>
>>>>>
>>>>> merge as@(a:as') bs@(b:bs')
>>>>> | a `cmp` b == GT = b : merge as bs'
>>>>> | otherwise = a : merge as' bs
>>>>> merge [] bs = bs
>>>>> merge as [] = as
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *Before the change:*
>>>>>
>>>>> benchmarking random ints/ghc
>>>>> time 3.687 s (3.541 s .. NaN s)
>>>>> 1.000 R² (1.000 R² .. 1.000 R²)
>>>>> mean 3.691 s (3.669 s .. 3.705 s)
>>>>> std dev 21.45 ms (0.0 s .. 24.76 ms)
>>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>>
>>>>> benchmarking random ints/greg
>>>>> time 2.648 s (2.482 s .. 2.822 s)
>>>>> 0.999 R² (0.998 R² .. 1.000 R²)
>>>>> mean 2.704 s (2.670 s .. 2.736 s)
>>>>> std dev 52.68 ms (0.0 s .. 54.49 ms)
>>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>>
>>>>> benchmarking random ints/gSort
>>>>> time 2.733 s (2.682 s .. 2.758 s)
>>>>> 1.000 R² (1.000 R² .. 1.000 R²)
>>>>> mean 2.707 s (2.689 s .. 2.718 s)
>>>>> std dev 16.84 ms (0.0 s .. 19.20 ms)
>>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>>
>>>>> *After the change:*
>>>>>
>>>>> benchmarking random ints/greg
>>>>> time 2.576 s (2.548 s .. 2.628 s)
>>>>> 1.000 R² (1.000 R² .. 1.000 R²)
>>>>> mean 2.590 s (2.578 s .. 2.599 s)
>>>>> std dev 12.99 ms (0.0 s .. 14.89 ms)
>>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>>
>>>>> benchmarking random ints/gSort
>>>>> time 2.538 s (2.412 s .. 2.627 s)
>>>>> 1.000 R² (0.999 R² .. 1.000 R²)
>>>>> mean 2.543 s (2.517 s .. 2.560 s)
>>>>> std dev 26.16 ms (0.0 s .. 30.21 ms)
>>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Mar 26, 2017 at 1:54 PM, Siddhanathan Shanmugam <
>>>>> siddhanathan+eml at gmail.com> wrote:
>>>>>
>>>>>> Theoretically, we could do better. We currently only exploit
>>>>>> monotonic runs in merge sort, but we could also exploit bitonic runs:
>>>>>>
>>>>>> dlist as = as [] `seq` as []
>>>>>>
>>>>>> sequences [] = [[]]
>>>>>> sequences [a] = [[a]]
>>>>>> sequences (a:xs) = bitonic a a (a:) xs
>>>>>>
>>>>>> bitonic min max as (b:bs)
>>>>>> | b `cmp` max /= LT = bitonic min b (as . (b:)) bs
>>>>>> | b `cmp` min /= GT = bitonic b max ((b:) . as) bs
>>>>>> | otherwise = dlist as : sequences (b:bs)
>>>>>> bitonic _ _ as [] = [dlist as]
>>>>>>
>>>>>>
>>>>>> The constant factors here might be too high to notice the difference
>>>>>> though.
>>>>>>
>>>>>>
>>>>>> > However, still my version is more laziness-friendly, i.e. it
>>>>>> requires fewer
>>>>>> > comparisons to get the
>>>>>> > N smallest elements of a list (see
>>>>>> > https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_
>>>>>> with_trace.hs).
>>>>>> >
>>>>>> > I wonder if this might not be a more useful trait than being able
>>>>>> to sort
>>>>>> > already sorted lists super fast.
>>>>>>
>>>>>> This comes down to a discussion of merge sort vs natural merge sort.
>>>>>>
>>>>>> Data.List.sort is an implementation of a variant of merge sort called
>>>>>> natural merge sort. The algorithm is linearithmic in the worst case, but
>>>>>> linear in the best case (already sorted list).
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 26, 2017 at 10:47 AM, Gregory Popovitch <
>>>>>> greg7mdp at gmail.com> wrote:
>>>>>>
>>>>>>> Thanks again @Siddhanathan! Looks like your gSort fixes the main
>>>>>>> issue with
>>>>>>> Data.List.sort().
>>>>>>>
>>>>>>> I have updated the test programs in https://github.com/greg7mdp/gh
>>>>>>> c-sort to
>>>>>>> include your new version.
>>>>>>>
>>>>>>> Here are the results (your new version looks like a definite
>>>>>>> improvement vs
>>>>>>> the current GHC one):
>>>>>>>
>>>>>>> input GHC sort My Orig proposal
>>>>>>> gSort
>>>>>>> ------------------------------------------------------------
>>>>>>> ----------------
>>>>>>> ---
>>>>>>> sorted ints (ascending) 151 456
>>>>>>> 148
>>>>>>> sorted ints (descending) 152 466
>>>>>>> 155
>>>>>>> random ints 2732 2006
>>>>>>> 2004
>>>>>>> random strings 6564 5549
>>>>>>> 5528
>>>>>>>
>>>>>>>
>>>>>>> So replacing the current GHC version with gSort is a no brainer, as
>>>>>>> it is
>>>>>>> better in all regards.
>>>>>>>
>>>>>>> However, still my version is more laziness-friendly, i.e. it
>>>>>>> requires fewer
>>>>>>> comparisons to get the
>>>>>>> N smallest elements of a list (see
>>>>>>> https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_wi
>>>>>>> th_trace.hs).
>>>>>>>
>>>>>>> I wonder if this might not be a more useful trait than being able to
>>>>>>> sort
>>>>>>> already sorted lists super fast.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> greg
>>>>>>>
>>>>>>> ________________________________
>>>>>>>
>>>>>>> From: siddhanathan at gmail.com [mailto:siddhanathan at gmail.com] On
>>>>>>> Behalf Of
>>>>>>> Siddhanathan Shanmugam
>>>>>>> Sent: Sunday, March 26, 2017 1:05 PM
>>>>>>> To: Gregory Popovitch
>>>>>>> Cc: Haskell Libraries
>>>>>>> Subject: Re: Proposal: a new implementation for Data.List.sort and
>>>>>>> Data.List.sortBy, which has better performance characteristics and
>>>>>>> is more
>>>>>>> laziness-friendly.
>>>>>>>
>>>>>>>
>>>>>>> Interesting. You are right, performance for sorting random lists has
>>>>>>> priority over performance for sorting already-sorted lists.
>>>>>>>
>>>>>>> Ignore the numbers for my previous version. Can you compare GHC's
>>>>>>> sort, your
>>>>>>> proposal, and gSort below?
>>>>>>>
>>>>>>>
>>>>>>> gSort :: Ord a => [a] -> [a]
>>>>>>> gSort = gSortBy compare
>>>>>>> gSortBy cmp = mergeAll . sequences
>>>>>>> where
>>>>>>> sequences (a:b:xs)
>>>>>>> | a `cmp` b == GT = descending b [a] xs
>>>>>>> | otherwise = ascending b (a:) xs
>>>>>>> sequences xs = [xs]
>>>>>>>
>>>>>>>
>>>>>>> descending a as (b:bs)
>>>>>>> | a `cmp` b == GT = descending b (a:as) bs
>>>>>>> descending a as bs = (a:as) : sequences bs
>>>>>>>
>>>>>>>
>>>>>>> ascending a as (b:bs)
>>>>>>> | a `cmp` b /= GT = ascending b (\ys -> as (a:ys)) bs
>>>>>>> ascending a as bs = as [a] `seq` as [a] : sequences bs
>>>>>>>
>>>>>>>
>>>>>>> mergeAll [x] = x
>>>>>>> mergeAll xs = mergeAll (mergePairs xs)
>>>>>>>
>>>>>>>
>>>>>>> mergePairs (a:b:xs) = merge a b : mergePairs xs
>>>>>>> mergePairs xs = xs
>>>>>>>
>>>>>>>
>>>>>>> merge as@(a:as') bs@(b:bs')
>>>>>>> | a `cmp` b == GT = b : merge as bs'
>>>>>>> | otherwise = a : merge as' bs
>>>>>>> merge [] bs = bs
>>>>>>> merge as [] = as
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sid
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Mar 26, 2017 at 9:19 AM, Gregory Popovitch <
>>>>>>> greg7mdp at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Thank you @Siddhanathan! I welcome any improvement you may
>>>>>>> make, as
>>>>>>> I said I
>>>>>>> am very far from a Haskell expert.
>>>>>>>
>>>>>>> I just tested your change with my test project
>>>>>>> (https://github.com/greg7mdp/ghc-sort
>>>>>>> <https://github.com/greg7mdp/ghc-sort> )
>>>>>>> and here are my results (mean times in ms):
>>>>>>>
>>>>>>> input GHC sort Orig proposal
>>>>>>> your
>>>>>>> change
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> ----------------
>>>>>>> ---
>>>>>>> sorted ints (ascending) 153 467
>>>>>>> 139
>>>>>>> sorted ints (descending) 152 472
>>>>>>> 599
>>>>>>> random ints 2824 2077
>>>>>>> 2126
>>>>>>> random strings 6564 5613
>>>>>>> 5983
>>>>>>>
>>>>>>> Your change is a definite improvement for sorted integers in
>>>>>>> ascending
>>>>>>> order, but is worse for other cases.
>>>>>>>
>>>>>>> Is there a real need to optimize the sort for already sorted
>>>>>>> list?
>>>>>>> Of course
>>>>>>> it should not be a degenerate
>>>>>>> case and take longer than sorting random numbers, but this
>>>>>>> is not
>>>>>>> the case
>>>>>>> here. Sorting already sorted
>>>>>>> lists is, even with my version, over 4 times faster than
>>>>>>> sorting
>>>>>>> random
>>>>>>> lists. This sounds perfectly
>>>>>>> acceptable to me, and I feel that trying to optimize this
>>>>>>> specific
>>>>>>> case
>>>>>>> further, if it comes at the
>>>>>>> detriment of the general case, is not desirable.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> greg
>>>>>>>
>>>>>>> ________________________________
>>>>>>>
>>>>>>> From: siddhanathan at gmail.com [mailto:siddhanathan at gmail.com]
>>>>>>> On
>>>>>>> Behalf Of
>>>>>>> Siddhanathan Shanmugam
>>>>>>> Sent: Sunday, March 26, 2017 11:41 AM
>>>>>>> To: Gregory Popovitch
>>>>>>> Cc: Haskell Libraries
>>>>>>> Subject: Re: Proposal: a new implementation for
>>>>>>> Data.List.sort and
>>>>>>> Data.List.sortBy, which has better performance
>>>>>>> characteristics and
>>>>>>> is more
>>>>>>> laziness-friendly.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thank you! This identifies a space leak in base which went
>>>>>>> unnoticed
>>>>>>> for 7
>>>>>>> years.
>>>>>>>
>>>>>>> Your implementation can be improved further. Instead of
>>>>>>> splitting
>>>>>>> into
>>>>>>> pairs, you could instead split into lists of sorted sublists
>>>>>>> by
>>>>>>> replacing
>>>>>>> the pairs function with the following
>>>>>>>
>>>>>>> pair = foldr f []
>>>>>>> where
>>>>>>> f x [] = [[x]]
>>>>>>> f x (y:ys)
>>>>>>> | x `cmp` head y == LT = (x:y):ys
>>>>>>> | otherwise = [x]:y:ys
>>>>>>>
>>>>>>> This should give you the same performance improvements for
>>>>>>> sorting
>>>>>>> random
>>>>>>> lists, but better performance while sorting ascending lists.
>>>>>>>
>>>>>>> The version in base takes it one step further by using a
>>>>>>> DList to
>>>>>>> handle the
>>>>>>> descending case efficiently as well, except there's a space
>>>>>>> leak
>>>>>>> right now
>>>>>>> because of which it is slower.
>>>>>>>
>>>>>>> On Sun, Mar 26, 2017 at 7:21 AM, Gregory Popovitch
>>>>>>> <greg7mdp at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Motivation:
>>>>>>> ----------
>>>>>>>
>>>>>>> Data.List.sort is a very important functionality in
>>>>>>> Haskell.
>>>>>>> I
>>>>>>> believe that
>>>>>>> the proposed implementation is:
>>>>>>>
>>>>>>> - significantly faster than the current
>>>>>>> implementation on
>>>>>>> unsorted
>>>>>>> lists,
>>>>>>> typically 14% to 27% faster
>>>>>>> - more laziness-friendly, i.e.:
>>>>>>> take 3 $ sort l
>>>>>>> will require significantly less comparisons than
>>>>>>> the
>>>>>>> current
>>>>>>> implementation
>>>>>>>
>>>>>>> Proposed Implementation
>>>>>>> -----------------------
>>>>>>>
>>>>>>> sort :: (Ord a) => [a] -> [a]
>>>>>>> sort = sortBy compare
>>>>>>>
>>>>>>> sortBy cmp [] = []
>>>>>>> sortBy cmp xs = head $ until (null.tail) reduce
>>>>>>> (pair xs)
>>>>>>> where
>>>>>>> pair (x:y:t) | x `cmp` y == GT = [y, x] : pair t
>>>>>>> | otherwise = [x, y] : pair t
>>>>>>> pair [x] = [[x]]
>>>>>>> pair [] = []
>>>>>>>
>>>>>>> reduce (v:w:x:y:t) = merge v' x' : reduce t
>>>>>>> where v' = merge v w
>>>>>>> x' = merge x y
>>>>>>>
>>>>>>> reduce (x:y:t) = merge x y : reduce t
>>>>>>> reduce xs = xs
>>>>>>>
>>>>>>> merge xs [] = xs
>>>>>>> merge [] ys = ys
>>>>>>> merge xs@(x:xs') ys@(y:ys')
>>>>>>> | x `cmp` y == GT = y : merge xs ys'
>>>>>>> | otherwise = x : merge xs' ys
>>>>>>>
>>>>>>>
>>>>>>> Effect and Interactions
>>>>>>> -----------------------
>>>>>>>
>>>>>>> I have a stack project with a criterion test for
>>>>>>> this new
>>>>>>> implementation,
>>>>>>> available at https://github.com/greg7mdp/ghc-sort
>>>>>>> <https://github.com/greg7mdp/ghc-sort>
>>>>>>>
>>>>>>> <https://github.com/greg7mdp/ghc-sort
>>>>>>> <https://github.com/greg7mdp/ghc-sort> > .
>>>>>>> I ran the tests on an Ubuntu 14.0.2 VM and GHC
>>>>>>> 8.0.2, and
>>>>>>> had the
>>>>>>> following
>>>>>>> results:
>>>>>>>
>>>>>>> - sorting of random lists of integers is 27% faster
>>>>>>> - sorting of random lists of strings is 14% faster
>>>>>>> - sorting of already sorted lists is significantly
>>>>>>> slower,
>>>>>>> but still
>>>>>>> much
>>>>>>> faster than sorting random lists
>>>>>>> - proposed version is more laziness friendly. For
>>>>>>> example
>>>>>>> this
>>>>>>> version of
>>>>>>> sortBy requires 11 comparisons to find
>>>>>>> the smallest element of a 15 element list, while
>>>>>>> the
>>>>>>> default
>>>>>>> Data.List.sortBy requires 15 comparisons.
>>>>>>> (see
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_wi
>>>>>>> th_trace.hs
>>>>>>> <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w
>>>>>>> ith_trace.hs>
>>>>>>>
>>>>>>> <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w
>>>>>>> ith_trace.hs
>>>>>>> <https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_w
>>>>>>> ith_trace.hs> >
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Test results
>>>>>>> ------------
>>>>>>>
>>>>>>> Criterion output (descending/ascending results are
>>>>>>> for
>>>>>>> already
>>>>>>> sorted
>>>>>>> lists).
>>>>>>> I barely understand what Criterion does, and I am
>>>>>>> puzzled
>>>>>>> with the
>>>>>>> various
>>>>>>> "T" output - maybe there is a bug in my bench code:
>>>>>>>
>>>>>>> vagrant at vagrant-ubuntu-trusty-64:/vagrant$ stack
>>>>>>> exec
>>>>>>> ghc-sort
>>>>>>> benchmarking ascending ints/ghc
>>>>>>> TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime
>>>>>>> 160.6 ms
>>>>>>> (153.4
>>>>>>> ms .. 167.8 ms)
>>>>>>> 0.997 R² (0.986 R² .. 1.000
>>>>>>> R²)
>>>>>>> mean 161.7 ms (158.3 ms .. 165.9
>>>>>>> ms)
>>>>>>> std dev 5.210 ms (3.193 ms .. 7.006
>>>>>>> ms)
>>>>>>> variance introduced by outliers: 12% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>> benchmarking ascending ints/greg
>>>>>>> TTTTTTTTTTTTTTTTtime 473.8 ms
>>>>>>> (398.6 ms ..
>>>>>>> 554.9
>>>>>>> ms)
>>>>>>> 0.996 R² (0.987 R² .. 1.000
>>>>>>> R²)
>>>>>>> mean 466.2 ms (449.0 ms .. 475.0
>>>>>>> ms)
>>>>>>> std dev 14.94 ms (0.0 s .. 15.29 ms)
>>>>>>> variance introduced by outliers: 19% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>> benchmarking descending ints/ghc
>>>>>>> TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTtime
>>>>>>> 165.1 ms
>>>>>>> (148.2
>>>>>>> ms .. 178.2 ms)
>>>>>>> 0.991 R² (0.957 R² .. 1.000
>>>>>>> R²)
>>>>>>> mean 158.7 ms (154.0 ms .. 164.3
>>>>>>> ms)
>>>>>>> std dev 7.075 ms (4.152 ms .. 9.903
>>>>>>> ms)
>>>>>>> variance introduced by outliers: 12% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>> benchmarking descending ints/greg
>>>>>>> TTTTTTTTTTTTTTTTtime 471.7 ms
>>>>>>> (419.8 ms ..
>>>>>>> 508.3
>>>>>>> ms)
>>>>>>> 0.999 R² (0.995 R² .. 1.000
>>>>>>> R²)
>>>>>>> mean 476.0 ms (467.5 ms .. 480.0
>>>>>>> ms)
>>>>>>> std dev 7.447 ms (67.99 as .. 7.865
>>>>>>> ms)
>>>>>>> variance introduced by outliers: 19% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>> benchmarking random ints/ghc
>>>>>>> TTTTTTTTTTTTTTTTtime 2.852 s
>>>>>>> (2.564 s ..
>>>>>>> 3.019 s)
>>>>>>> 0.999 R² (0.997 R² .. 1.000
>>>>>>> R²)
>>>>>>> mean 2.812 s (2.785 s .. 2.838 s)
>>>>>>> std dev 44.06 ms (543.9 as .. 44.97
>>>>>>> ms)
>>>>>>> variance introduced by outliers: 19% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>> benchmarking random ints/greg
>>>>>>> TTTTTTTTTTTTTTTTtime 2.032 s
>>>>>>> (1.993 s ..
>>>>>>> 2.076 s)
>>>>>>> 1.000 R² (1.000 R² .. 1.000
>>>>>>> R²)
>>>>>>> mean 2.028 s (2.019 s .. 2.033 s)
>>>>>>> std dev 7.832 ms (0.0 s .. 8.178 ms)
>>>>>>> variance introduced by outliers: 19% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>> benchmarking shakespeare/ghc
>>>>>>> TTTTTTTTTTTTTTTTtime 6.504 s
>>>>>>> (6.391 s ..
>>>>>>> 6.694 s)
>>>>>>> 1.000 R² (1.000 R² .. 1.000
>>>>>>> R²)
>>>>>>> mean 6.499 s (6.468 s .. 6.518 s)
>>>>>>> std dev 28.85 ms (0.0 s .. 32.62 ms)
>>>>>>> variance introduced by outliers: 19% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>> benchmarking shakespeare/greg
>>>>>>> TTTTTTTTTTTTTTTTtime 5.560 s
>>>>>>> (5.307 s ..
>>>>>>> 5.763 s)
>>>>>>> 1.000 R² (0.999 R² .. 1.000
>>>>>>> R²)
>>>>>>> mean 5.582 s (5.537 s .. 5.607 s)
>>>>>>> std dev 39.30 ms (0.0 s .. 43.49 ms)
>>>>>>> variance introduced by outliers: 19% (moderately
>>>>>>> inflated)
>>>>>>>
>>>>>>>
>>>>>>> Costs and Drawbacks
>>>>>>> -------------------
>>>>>>>
>>>>>>> The only cost I see is the reduced performance when
>>>>>>> sorting
>>>>>>> already
>>>>>>> sorted
>>>>>>> lists. However, since this remains quite efficient,
>>>>>>> indeed
>>>>>>> over 4
>>>>>>> times
>>>>>>> faster than sorting unsorted lists, I think it is an
>>>>>>> acceptable
>>>>>>> tradeoff.
>>>>>>>
>>>>>>> Final note
>>>>>>> ----------
>>>>>>>
>>>>>>> My Haskell is very rusty. I worked on this a couple
>>>>>>> years
>>>>>>> ago when I
>>>>>>> was
>>>>>>> learning Haskell, and meant to propose it to the
>>>>>>> Haskell
>>>>>>> community,
>>>>>>> but
>>>>>>> never got to it at the time.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Libraries mailing list
>>>>>>> Libraries at haskell.org
>>>>>>> http://mail.haskell.org/cgi-bi
>>>>>>> n/mailman/listinfo/libraries
>>>>>>> <http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries>
>>>>>>>
>>>>>>> <http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>>>>>> <http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Libraries mailing list
>>>> Libraries at haskell.org
>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>>>
>>>>
>>> _______________________________________________
>>> Libraries mailing list
>>> Libraries at haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>>
>>>
>>
>> _______________________________________________
>> Libraries mailing list
>> Libraries at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20170328/975de3cb/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook.jpg
Type: image/jpeg
Size: 105044 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20170328/975de3cb/attachment-0001.jpg>
More information about the Libraries
mailing list