<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 11.00.10570.1001"></HEAD>
<BODY>
<DIV dir=ltr align=left><FONT color=#000080 face=Calibri><SPAN
class=070280421-27032017>Pretty cool by the way, we now have a 31% improvement
for sorting lists of random integers vs the current
Data.List.sortBy.</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT color=#000080 face=Calibri><SPAN
class=070280421-27032017></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT color=#000080 face=Calibri><SPAN
class=070280421-27032017>greg</SPAN></FONT></DIV><BR>
<DIV lang=en-us class=OutlookMessageHeader dir=ltr align=left>
<HR tabIndex=-1>
<FONT size=2 face=Tahoma><B>From:</B> siddhanathan@gmail.com
[mailto:siddhanathan@gmail.com] <B>On Behalf Of </B>Siddhanathan
Shanmugam<BR><B>Sent:</B> Monday, March 27, 2017 12:53 PM<BR><B>To:</B> Gregory
Popovitch<BR><B>Cc:</B> Haskell Libraries<BR><B>Subject:</B> Re: Proposal: a new
implementation for Data.List.sort and Data.List.sortBy, which has better
performance characteristics and is more laziness-friendly.<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV dir=ltr>
<DIV>We can improve things a bit further by forcing evaluation (with seq) along
the way appropriately.<BR></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV>
<DIV>
<DIV><SPAN style="FONT-FAMILY: monospace,monospace">gregSortBy cmp [] =
[]</SPAN><BR></DIV>
<DIV><FONT face="monospace, monospace">gregSortBy cmp xs = head $ until
(null.tail) reduce (pair xs)</FONT></DIV>
<DIV><FONT face="monospace, monospace"> where</FONT></DIV>
<DIV><FONT face="monospace, monospace"> pair (x:y:t) | x `cmp` y ==
GT = [y, x] : pair t</FONT></DIV>
<DIV><FONT face="monospace, monospace">
| otherwise = [x, y] : pair
t</FONT></DIV>
<DIV><FONT face="monospace, monospace"> pair [x] =
[[x]]</FONT></DIV>
<DIV><FONT face="monospace, monospace"> pair [] =
[]</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> reduce (v:w:x:y:t) = merge
v' x' `seq` merge v' x' : reduce t</FONT></DIV>
<DIV><FONT face="monospace, monospace">
where v' = merge v w `seq` merge
v w</FONT></DIV>
<DIV><FONT face="monospace, monospace">
x' = merge
x y `seq` merge x y</FONT></DIV>
<DIV><FONT face="monospace, monospace">
</FONT></DIV>
<DIV><FONT face="monospace, monospace"> reduce (x:y:t) = merge x y
`seq` merge x y : reduce t</FONT></DIV>
<DIV><FONT face="monospace, monospace"> reduce xs
= xs</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> merge xs []
= xs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> merge [] ys
= ys</FONT></DIV>
<DIV><FONT face="monospace, monospace"> merge xs@(x:xs')
ys@(y:ys') </FONT></DIV>
<DIV><FONT face="monospace, monospace"> | x
`cmp` y == GT = y : merge xs ys'</FONT></DIV>
<DIV><FONT face="monospace, monospace"> |
otherwise = x : merge xs' ys</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><SPAN style="FONT-FAMILY: monospace,monospace">gSortBy cmp = mergeAll .
sequences</SPAN><BR></DIV>
<DIV><FONT face="monospace, monospace"> where</FONT></DIV>
<DIV><FONT face="monospace, monospace"> sequences
(a:b:xs)</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | a `cmp` b == GT =
descending b [a] xs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | otherwise
= ascending b (a:) xs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> sequences xs =
[xs]</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> descending a as
(b:bs)</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | a `cmp` b == GT =
descending b (a:as) bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> descending a as bs =
(a:as) `seq` (a:as) : sequences bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> ascending a as
(b:bs)</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | a `cmp` b /= GT =
ascending b (as . (a:)) bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> ascending a as bs =
as [a] `seq` as [a] : sequences bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> mergeAll [x] =
x</FONT></DIV>
<DIV><FONT face="monospace, monospace"> mergeAll xs =
mergeAll (mergePairs xs)</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> mergePairs (a:b:xs) = merge
a b `seq` merge a b : mergePairs xs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> mergePairs xs
= xs</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> merge as@(a:as')
bs@(b:bs')</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | a `cmp` b == GT =
b : merge as bs'</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | otherwise
= a : merge as' bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> merge [] bs
= bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> merge as []
= as</FONT></DIV></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV>
<DIV><B>Before the change:</B></DIV>
<DIV><BR></DIV>
<DIV>
<DIV><FONT face="monospace, monospace">benchmarking random ints/ghc</FONT></DIV>
<DIV><FONT face="monospace, monospace">time
3.687 s (3.541 s .. NaN s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">
1.000 Rē (1.000 Rē .. 1.000
Rē)</FONT></DIV>
<DIV><FONT face="monospace, monospace">mean
3.691 s (3.669 s .. 3.705 s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">std dev
21.45 ms (0.0 s .. 24.76 ms)</FONT></DIV>
<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19%
(moderately inflated)</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace">benchmarking random
ints/greg</FONT></DIV>
<DIV><FONT face="monospace, monospace">time
2.648 s (2.482 s .. 2.822 s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">
0.999 Rē (0.998 Rē .. 1.000
Rē)</FONT></DIV>
<DIV><FONT face="monospace, monospace">mean
2.704 s (2.670 s .. 2.736 s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">std dev
52.68 ms (0.0 s .. 54.49 ms)</FONT></DIV>
<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19%
(moderately inflated)</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace">benchmarking random
ints/gSort</FONT></DIV>
<DIV><FONT face="monospace, monospace">time
2.733 s (2.682 s .. 2.758 s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">
1.000 Rē (1.000 Rē .. 1.000
Rē)</FONT></DIV>
<DIV><FONT face="monospace, monospace">mean
2.707 s (2.689 s .. 2.718 s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">std dev
16.84 ms (0.0 s .. 19.20 ms)</FONT></DIV>
<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19%
(moderately inflated)</FONT></DIV></DIV>
<DIV><BR></DIV>
<DIV><B>After the change:</B></DIV>
<DIV><BR></DIV>
<DIV>
<DIV><FONT face="monospace, monospace">benchmarking random
ints/greg</FONT></DIV>
<DIV><FONT face="monospace, monospace">time
2.576 s (2.548 s .. 2.628 s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">
1.000 Rē (1.000 Rē .. 1.000
Rē)</FONT></DIV>
<DIV><FONT face="monospace, monospace">mean
2.590 s (2.578 s .. 2.599 s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">std dev
12.99 ms (0.0 s .. 14.89 ms)</FONT></DIV>
<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19%
(moderately inflated)</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace">benchmarking random
ints/gSort</FONT></DIV>
<DIV><FONT face="monospace, monospace">time
2.538 s (2.412 s .. 2.627 s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">
1.000 Rē (0.999 Rē .. 1.000
Rē)</FONT></DIV>
<DIV><FONT face="monospace, monospace">mean
2.543 s (2.517 s .. 2.560 s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">std dev
26.16 ms (0.0 s .. 30.21 ms)</FONT></DIV>
<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19%
(moderately inflated)</FONT></DIV></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV></DIV>
<DIV class=gmail_extra><BR>
<DIV class=gmail_quote>On Sun, Mar 26, 2017 at 1:54 PM, Siddhanathan Shanmugam
<SPAN dir=ltr><<A href="mailto:siddhanathan+eml@gmail.com"
target=_blank>siddhanathan+eml@gmail.com</A>></SPAN> wrote:<BR>
<BLOCKQUOTE class=gmail_quote
style="PADDING-LEFT: 1ex; BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex">
<DIV dir=ltr>
<DIV>
<DIV>Theoretically, we could do better. We currently only exploit monotonic
runs in merge sort, but we could also exploit bitonic runs:</DIV>
<DIV><BR></DIV>
<DIV>
<DIV><FONT face="monospace, monospace"> dlist as = as [] `seq` as
[]<BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> sequences [] =
[[]]</FONT></DIV>
<DIV><FONT face="monospace, monospace"> sequences [a] =
[[a]]</FONT></DIV>
<DIV><FONT face="monospace, monospace"> sequences (a:xs) =
bitonic a a (a:) xs</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> bitonic min max as
(b:bs)</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | b `cmp` max /=
LT = bitonic min b (as . (b:)) bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | b `cmp` min /=
GT = bitonic b max ((b:) . as) bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | otherwise =
dlist as : sequences (b:bs)</FONT></DIV>
<DIV><FONT face="monospace, monospace"> bitonic _ _ as [] =
[dlist as]</FONT></DIV>
<DIV><BR></DIV></DIV></DIV>
<DIV><BR></DIV>
<DIV>The constant factors here might be too high to notice the difference
though.</DIV><SPAN>
<DIV><BR></DIV>
<DIV><SPAN style="FONT-SIZE: 12px"><BR></SPAN></DIV>
<DIV><SPAN style="FONT-SIZE: 12px">> However, still my version is more
laziness-friendly, i.e. it requires fewer</SPAN><BR></DIV>
<DIV><SPAN style="FONT-SIZE: 12px">> comparisons to get the</SPAN><BR
style="FONT-SIZE: 12px"><SPAN style="FONT-SIZE: 12px">> N smallest elements
of a list </SPAN><SPAN
style="FONT-SIZE: 12px">(see</SPAN><BR></DIV>> <A
style="FONT-SIZE: 12px"
href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs"
rel=noreferrer
target=_blank>https://github.com/greg7mdp/<WBR>ghc-sort/blob/master/src/sort_<WBR>with_trace.hs</A><SPAN
style="FONT-SIZE: 12px">).</SPAN>
<DIV><SPAN style="FONT-SIZE: 12px">></SPAN></DIV>
<DIV><SPAN style="FONT-SIZE: 12px">> I wonder if this might not be a more
useful trait than being able to sort</SPAN><BR></DIV>
<DIV><SPAN style="FONT-SIZE: 12px">> already sorted lists super
fast.</SPAN><BR></DIV>
<DIV><SPAN style="FONT-SIZE: 12px"><BR></SPAN></DIV></SPAN>
<DIV><SPAN style="FONT-SIZE: 12px">This comes down to a discussion of merge
sort vs natural merge sort.</SPAN></DIV>
<DIV><BR></DIV>
<DIV>Data.List.sort is an implementation of a variant of merge sort called
natural merge sort. The algorithm is linearithmic in the worst case, but
linear in the best case (already sorted list).</DIV>
<DIV><BR></DIV>
<DIV><BR></DIV></DIV>
<DIV class=HOEnZb>
<DIV class=h5>
<DIV class=gmail_extra><BR>
<DIV class=gmail_quote>On Sun, Mar 26, 2017 at 10:47 AM, Gregory Popovitch
<SPAN dir=ltr><<A href="mailto:greg7mdp@gmail.com"
target=_blank>greg7mdp@gmail.com</A>></SPAN> wrote:<BR>
<BLOCKQUOTE class=gmail_quote
style="PADDING-LEFT: 1ex; BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex">Thanks
again @Siddhanathan! Looks like your gSort fixes the main issue
with<BR>Data.List.sort().<BR><BR>I have updated the test programs in <A
href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer
target=_blank>https://github.com/greg7mdp/gh<WBR>c-sort</A> to<BR>include
your new version.<BR><BR>Here are the results (your new version looks like a
definite improvement vs<BR>the current GHC one):<BR><BR>input
GHC
sort My Orig proposal
gSort<BR>------------------------------<WBR>------------------------------<WBR>----------------<BR>---<BR>sorted
ints (ascending) 151
456
148<BR>sorted ints (descending) 152
466
155<BR>random ints
2732
2006
2004<BR>random strings
6564
5549
5528<BR><BR><BR>So replacing the current GHC version with gSort is a
no brainer, as it is<BR>better in all regards.<BR><BR>However, still my
version is more laziness-friendly, i.e. it requires fewer<BR>comparisons to
get the<BR>N smallest elements of a list (see<BR><A
href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs"
rel=noreferrer
target=_blank>https://github.com/greg7mdp/gh<WBR>c-sort/blob/master/src/sort_wi<WBR>th_trace.hs</A>).<BR><BR>I
wonder if this might not be a more useful trait than being able to
sort<BR>already sorted lists super
fast.<BR><SPAN><BR>Thanks,<BR><BR>greg<BR><BR>______________________________<WBR>__<BR><BR>From:
<A href="mailto:siddhanathan@gmail.com"
target=_blank>siddhanathan@gmail.com</A> [mailto:<A
href="mailto:siddhanathan@gmail.com"
target=_blank>siddhanathan@gmail.com</A><WBR>] On Behalf Of<BR>Siddhanathan
Shanmugam<BR></SPAN>Sent: Sunday, March 26, 2017 1:05 PM<BR><SPAN>To:
Gregory Popovitch<BR>Cc: Haskell Libraries<BR>Subject: Re: Proposal: a new
implementation for Data.List.sort and<BR>Data.List.sortBy, which has better
performance characteristics and is
more<BR>laziness-friendly.<BR><BR><BR></SPAN>
<DIV>
<DIV class=m_6283889194055629001h5>Interesting. You are right, performance
for sorting random lists has<BR>priority over performance for sorting
already-sorted lists.<BR><BR>Ignore the numbers for my previous version. Can
you compare GHC's sort, your<BR>proposal, and gSort below?<BR><BR><BR>gSort
:: Ord a => [a] -> [a]<BR>gSort = gSortBy compare<BR>gSortBy cmp =
mergeAll . sequences<BR> where<BR> sequences
(a:b:xs)<BR> | a `cmp` b == GT = descending b [a]
xs<BR> | otherwise =
ascending b (a:) xs<BR> sequences xs =
[xs]<BR><BR><BR> descending a as (b:bs)<BR>
| a `cmp` b == GT = descending b (a:as) bs<BR> descending a as
bs = (a:as) : sequences bs<BR><BR><BR> ascending a as
(b:bs)<BR> | a `cmp` b /= GT = ascending b (\ys -> as
(a:ys)) bs<BR> ascending a as bs = as [a] `seq` as
[a] : sequences bs<BR><BR><BR> mergeAll [x] = x<BR>
mergeAll xs = mergeAll (mergePairs xs)<BR><BR><BR>
mergePairs (a:b:xs) = merge a b : mergePairs xs<BR> mergePairs
xs = xs<BR><BR><BR> merge as@(a:as')
bs@(b:bs')<BR> | a `cmp` b == GT = b : merge as
bs'<BR> | otherwise = a :
merge as' bs<BR> merge [] bs =
bs<BR> merge as [] =
as<BR><BR><BR>Thanks,<BR>Sid<BR><BR><BR>On Sun, Mar 26, 2017 at 9:19 AM,
Gregory Popovitch <<A href="mailto:greg7mdp@gmail.com"
target=_blank>greg7mdp@gmail.com</A>><BR>wrote:<BR><BR><BR>
Thank you @Siddhanathan! I welcome any improvement you may
make, as<BR>I said I<BR> am very far from a
Haskell expert.<BR><BR> I just tested your change
with my test project<BR> (<A
href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A><BR></DIV></DIV><<A
href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A>> )<BR>
<DIV>
<DIV class=m_6283889194055629001h5> and here are
my results (mean times in ms):<BR><BR>
input
GHC sort Orig
proposal<BR>your<BR>
change<BR><BR>------------------------------<WBR>------------------------------<WBR>----------------<BR>
---<BR> sorted ints
(ascending) 153
467<BR>139<BR> sorted ints
(descending) 152
472<BR>599<BR> random ints
2824
2077<BR>2126<BR>
random strings
6564
5613<BR>5983<BR><BR> Your change is a definite
improvement for sorted integers in<BR>ascending<BR>
order, but is worse for other cases.<BR><BR>
Is there a real need to optimize the sort for already sorted
list?<BR>Of course<BR> it should not be a
degenerate<BR> case and take longer than sorting
random numbers, but this is not<BR>the case<BR>
here. Sorting already sorted<BR> lists is, even
with my version, over 4 times faster than sorting<BR>random<BR>
lists. This sounds perfectly<BR>
acceptable to me, and I feel that trying to optimize this
specific<BR>case<BR> further, if it comes at
the<BR> detriment of the general case, is not
desirable.<BR><BR> Thanks,<BR><BR>
greg<BR><BR>
______________________________<WBR>__<BR><BR>
From: <A href="mailto:siddhanathan@gmail.com"
target=_blank>siddhanathan@gmail.com</A> [mailto:<A
href="mailto:siddhanathan@gmail.com"
target=_blank>siddhanathan@gmail.com</A><WBR>] On<BR>Behalf Of<BR>
Siddhanathan Shanmugam<BR>
Sent: Sunday, March 26, 2017 11:41 AM<BR> To:
Gregory Popovitch<BR> Cc: Haskell
Libraries<BR> Subject: Re: Proposal: a new
implementation for Data.List.sort and<BR>
Data.List.sortBy, which has better performance characteristics and<BR>is
more<BR> laziness-friendly.<BR><BR><BR><BR>
Thank you! This identifies a space leak in base which
went unnoticed<BR>for 7<BR> years.<BR><BR>
Your implementation can be improved further. Instead of
splitting<BR>into<BR> pairs, you could instead
split into lists of sorted sublists by<BR>replacing<BR>
the pairs function with the following<BR><BR>
pair = foldr f []<BR>
where<BR>
f x [] = [[x]]<BR>
f x (y:ys)<BR>
| x `cmp` head y == LT = (x:y):ys<BR>
| otherwise
= [x]:y:ys<BR><BR> This should give
you the same performance improvements for sorting<BR>random<BR>
lists, but better performance while sorting ascending
lists.<BR><BR> The version in base takes it one
step further by using a DList to<BR>handle the<BR>
descending case efficiently as well, except there's a space
leak<BR>right now<BR> because of which it is
slower.<BR><BR> On Sun, Mar 26, 2017 at 7:21 AM,
Gregory Popovitch<BR><<A href="mailto:greg7mdp@gmail.com"
target=_blank>greg7mdp@gmail.com</A>><BR>
wrote:<BR><BR><BR><BR>
Motivation:<BR>
----------<BR><BR>
Data.List.sort is a very important functionality in
Haskell.<BR>I<BR> believe that<BR>
the proposed implementation
is:<BR><BR> -
significantly faster than the current implementation
on<BR>unsorted<BR> lists,<BR>
typically 14% to 27% faster<BR>
- more laziness-friendly,
i.e.:<BR>
take 3 $ sort l<BR>
will require significantly less comparisons than
the<BR>current<BR>
implementation<BR><BR>
Proposed Implementation<BR>
-----------------------<BR><BR>
sort :: (Ord a) => [a] -> [a]<BR>
sort = sortBy
compare<BR><BR>
sortBy cmp [] = []<BR>
sortBy cmp xs = head $ until (null.tail) reduce (pair xs)<BR>
where<BR>
pair (x:y:t)
| x `cmp` y == GT = [y, x] : pair t<BR>
| otherwise = [x, y] : pair
t<BR>
pair [x] = [[x]]<BR>
pair [] = []<BR><BR>
reduce (v:w:x:y:t) = merge v' x' : reduce
t<BR>
where v' = merge v w<BR>
x' = merge x y<BR><BR>
reduce
(x:y:t) = merge x y : reduce t<BR>
reduce xs = xs<BR><BR>
merge xs
[] = xs<BR>
merge [] ys
= ys<BR>
merge xs@(x:xs') ys@(y:ys')<BR>
| x `cmp` y == GT = y : merge xs ys'<BR>
| otherwise = x : merge xs'
ys<BR><BR><BR> Effect
and Interactions<BR>
-----------------------<BR><BR>
I have a stack project with a criterion test for this
new<BR> implementation,<BR>
available at <A
href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer
target=_blank>https://github.com/greg7mdp/gh<WBR>c-sort</A><BR><<A
href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A>><BR><BR>
<<A href="https://github.com/greg7mdp/ghc-sort"
rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A><BR><<A
href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A>> >
.<BR> I ran the tests
on an Ubuntu 14.0.2 VM and GHC 8.0.2, and<BR>had the<BR>
following<BR>
results:<BR><BR> -
sorting of random lists of integers is 27% faster<BR>
- sorting of random lists of strings is
14% faster<BR> -
sorting of already sorted lists is significantly slower,<BR>but
still<BR> much<BR>
faster than sorting random lists<BR>
- proposed version is more
laziness friendly. For example<BR>this<BR>
version of<BR> sortBy
requires 11 comparisons to find<BR>
the smallest element of a 15 element list, while
the<BR>default<BR>
Data.List.sortBy requires 15 comparisons.<BR>
(see<BR><BR><BR><A
href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs"
rel=noreferrer
target=_blank>https://github.com/greg7mdp/gh<WBR>c-sort/blob/master/src/sort_wi<WBR>th_trace.hs</A><BR><<A
href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs"
rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort/blob/master/src/sort_w<WBR>ith_trace.hs</A>><BR><BR><<A
href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs"
rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort/blob/master/src/sort_w<WBR>ith_trace.hs</A><BR><<A
href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs"
rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort/blob/master/src/sort_w<WBR>ith_trace.hs</A>>
><BR>)<BR><BR><BR><BR>
Test results<BR>
------------<BR><BR>
Criterion output (descending/ascending results are
for<BR>already<BR> sorted<BR>
lists).<BR>
I barely understand what Criterion does, and I
am puzzled<BR>with the<BR> various<BR>
"T" output - maybe there is
a bug in my bench code:<BR><BR>
vagrant@vagrant-ubuntu-trusty-<WBR>64:/vagrant$ stack
exec<BR>ghc-sort<BR>
benchmarking ascending ints/ghc<BR>
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT<WBR>TTTTTTTtime<BR>160.6
ms<BR> (153.4<BR>
ms .. 167.8 ms)<BR>
0.997 Rē (0.986 Rē .. 1.000
Rē)<BR> mean
161.7 ms
(158.3 ms .. 165.9 ms)<BR>
std dev 5.210
ms (3.193 ms .. 7.006 ms)<BR>
variance introduced by outliers: 12% (moderately
inflated)<BR><BR>
benchmarking ascending ints/greg<BR>
TTTTTTTTTTTTTTTTtime
473.8 ms (398.6 ms
..<BR>554.9<BR> ms)<BR>
0.996 Rē (0.987 Rē .. 1.000
Rē)<BR> mean
466.2 ms
(449.0 ms .. 475.0 ms)<BR>
std dev 14.94
ms (0.0 s .. 15.29 ms)<BR>
variance introduced by outliers: 19% (moderately
inflated)<BR><BR>
benchmarking descending ints/ghc<BR>
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT<WBR>TTTTTTTtime<BR>165.1
ms<BR> (148.2<BR>
ms .. 178.2 ms)<BR>
0.991 Rē (0.957 Rē .. 1.000
Rē)<BR> mean
158.7 ms
(154.0 ms .. 164.3 ms)<BR>
std dev 7.075
ms (4.152 ms .. 9.903 ms)<BR>
variance introduced by outliers: 12% (moderately
inflated)<BR><BR>
benchmarking descending ints/greg<BR>
TTTTTTTTTTTTTTTTtime
471.7 ms (419.8 ms
..<BR>508.3<BR> ms)<BR>
0.999 Rē (0.995 Rē .. 1.000
Rē)<BR> mean
476.0 ms
(467.5 ms .. 480.0 ms)<BR>
std dev 7.447
ms (67.99 as .. 7.865 ms)<BR>
variance introduced by outliers: 19% (moderately
inflated)<BR><BR>
benchmarking random ints/ghc<BR>
TTTTTTTTTTTTTTTTtime
2.852 s (2.564 s ..<BR>3.019 s)<BR>
0.999 Rē (0.997
Rē .. 1.000 Rē)<BR>
mean 2.812
s (2.785 s .. 2.838 s)<BR>
std dev
44.06 ms (543.9 as .. 44.97 ms)<BR>
variance introduced by outliers: 19% (moderately
inflated)<BR><BR>
benchmarking random ints/greg<BR>
TTTTTTTTTTTTTTTTtime
2.032 s (1.993 s ..<BR>2.076 s)<BR>
1.000 Rē (1.000
Rē .. 1.000 Rē)<BR>
mean 2.028
s (2.019 s .. 2.033 s)<BR>
std dev
7.832 ms (0.0 s .. 8.178 ms)<BR>
variance introduced by outliers: 19% (moderately
inflated)<BR><BR>
benchmarking shakespeare/ghc<BR>
TTTTTTTTTTTTTTTTtime
6.504 s (6.391 s ..<BR>6.694 s)<BR>
1.000 Rē (1.000
Rē .. 1.000 Rē)<BR>
mean 6.499
s (6.468 s .. 6.518 s)<BR>
std dev
28.85 ms (0.0 s .. 32.62 ms)<BR>
variance introduced by outliers: 19% (moderately
inflated)<BR><BR>
benchmarking shakespeare/greg<BR>
TTTTTTTTTTTTTTTTtime
5.560 s (5.307 s ..<BR>5.763 s)<BR>
1.000 Rē (0.999
Rē .. 1.000 Rē)<BR>
mean 5.582
s (5.537 s .. 5.607 s)<BR>
std dev
39.30 ms (0.0 s .. 43.49 ms)<BR>
variance introduced by outliers: 19% (moderately
inflated)<BR><BR><BR>
Costs and Drawbacks<BR>
-------------------<BR><BR>
The only cost I see is the reduced performance when
sorting<BR>already<BR> sorted<BR>
lists. However, since this remains
quite efficient, indeed<BR>over 4<BR>
times<BR> faster than
sorting unsorted lists, I think it is an<BR>acceptable<BR>
tradeoff.<BR><BR>
Final note<BR>
----------<BR><BR>
My Haskell is very rusty. I worked on this a couple years<BR>ago when
I<BR> was<BR>
learning Haskell, and meant to propose it to the
Haskell<BR>community,<BR> but<BR>
never got to it at the
time.<BR><BR>
______________________________<WBR>_________________<BR>
Libraries mailing list<BR>
<A
href="mailto:Libraries@haskell.org"
target=_blank>Libraries@haskell.org</A><BR>
<A
href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries"
rel=noreferrer
target=_blank>http://mail.haskell.org/cgi-bi<WBR>n/mailman/listinfo/libraries</A><BR><<A
href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries"
rel=noreferrer
target=_blank>http://mail.haskell.org/cgi-b<WBR>in/mailman/listinfo/libraries</A>><BR><BR></DIV></DIV>
<<A
href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries"
rel=noreferrer
target=_blank>http://mail.haskell.org/cgi-b<WBR>in/mailman/listinfo/libraries</A><BR><<A
href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries"
rel=noreferrer
target=_blank>http://mail.haskell.org/cgi-b<WBR>in/mailman/listinfo/libraries</A>>
><BR><BR><BR><BR><BR><BR><BR><BR><BR></BLOCKQUOTE></DIV><BR></DIV></DIV></DIV></BLOCKQUOTE></DIV><BR></DIV></BODY></HTML>