<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 11.00.10570.1001"></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=146023323-27032017><FONT face=Calibri><FONT
color=#000080><SPAN class=755023923-27032017>Sid,
</SPAN></FONT></FONT></SPAN></DIV>
<DIV dir=ltr align=left>
<DIV dir=ltr align=left><SPAN class=146023323-27032017><FONT face=Calibri><FONT
color=#000080><SPAN
class=755023923-27032017></SPAN></FONT></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=146023323-27032017><FONT face=Calibri><FONT
color=#000080><SPAN class=755023923-27032017></SPAN>I'd be delighted to
submit<SPAN class=755023923-27032017> </SPAN>the patch, as long as I have
permission (which I probably don't), you feel confident about the change and
maybe a couple of other people agree.</FONT></FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=146023323-27032017><FONT color=#000080
face=Calibri></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=146023323-27032017><FONT face=Calibri><FONT
color=#000080>Here is the proposed change. Tests shows significant speed
improvement (30%) when sorting lists of random numbers, and same efficiency for
sorting already sorted lists (both ascending and descending).<SPAN
class=523135523-27032017> </SPAN></FONT></FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=146023323-27032017><FONT face=Calibri><FONT
color=#000080><SPAN
class=523135523-27032017> </SPAN></FONT></FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=146023323-27032017><FONT color=#000080
face=Calibri><IMG src="cid:523135523@27032017-0380"></FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=146023323-27032017></SPAN><SPAN
class=146023323-27032017><FONT color=#000080
face=Calibri></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=146023323-27032017><FONT color=#000080
face=Calibri>Thanks,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=146023323-27032017><FONT color=#000080
face=Calibri></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=146023323-27032017><FONT color=#000080
face=Calibri>greg</FONT></SPAN></DIV></DIV><BR>
<DIV lang=en-us class=OutlookMessageHeader dir=ltr align=left>
<HR tabIndex=-1>
<FONT size=2 face=Tahoma><B>From:</B> siddhanathan@gmail.com
[mailto:siddhanathan@gmail.com] <B>On Behalf Of </B>Siddhanathan
Shanmugam<BR><B>Sent:</B> Monday, March 27, 2017 6:53 PM<BR><B>To:</B> Gregory
Popovitch<BR><B>Subject:</B> RE: Proposal: a new implementation for
Data.List.sort and Data.List.sortBy, which has better performance
characteristics and is more laziness-friendly.<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV>
<DIV>Since I don't see any regressions, this doesn't really need CLC approval.
The changes are also small enough that a Github PR may be accepted (otherwise,
the change goes in via Phabricator).<BR></DIV>
<DIV><BR></DIV>
<DIV>Are you interested in implementing this patch? If yes, a standard Github PR
should be fine. Right now gSort is a three line change I think. It will be
changed in ghc/libraries/base/Data/OldList.hs on the ghc/ghc repo on
Github.</DIV>
<DIV><BR></DIV>
<DIV><SPAN style="FONT-FAMILY: sans-serif">I'm hoping for some more comments
from other Haskellers, before pushing for this change in base. I feel like we
may be missing a potential optimization that someone else might spot. So
probably going to wait a few days. </SPAN><BR></DIV><BR>
<DIV class=gmail_extra><BR>
<DIV class=gmail_quote>On Mar 27, 2017 11:43 AM, "Gregory Popovitch" <<A
href="mailto:greg7mdp@gmail.com">greg7mdp@gmail.com</A>> wrote:<BR
type="attribution">
<BLOCKQUOTE class=quote
style="PADDING-LEFT: 1ex; BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex"><U></U>
<DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236217273418-27032017><FONT color=#000080
face=Calibri>Hi Sid,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236217273418-27032017><FONT color=#000080
face=Calibri></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236217273418-27032017><FONT color=#000080
face=Calibri>Thanks, glad you looked into that. My understanding of the
Haskell execution model is really poor, so I can't say one way or
the other, but I felt that laziness ought to be considered as well, and
I'm glad it was :-)</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236217273418-27032017><FONT color=#000080
face=Calibri></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236217273418-27032017><FONT color=#000080
face=Calibri>So in conclusion it looks like we have a winner with
your latest gSortBy version. How do we get this pushed to the GHC
library?</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236217273418-27032017><FONT color=#000080
face=Calibri></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236217273418-27032017><FONT color=#000080
face=Calibri>Thanks,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236217273418-27032017><FONT color=#000080
face=Calibri></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236217273418-27032017><FONT color=#000080
face=Calibri>greg</FONT></SPAN></DIV><BR>
<DIV lang=en-us class=m_-3409671060788413236OutlookMessageHeader dir=ltr
align=left>
<HR>
<FONT size=2 face=Tahoma>
<DIV class=quoted-text><B>From:</B> <A href="mailto:siddhanathan@gmail.com"
target=_blank>siddhanathan@gmail.com</A> [mailto:<A
href="mailto:siddhanathan@gmail.com"
target=_blank>siddhanathan@gmail.com</A><WBR>] <B>On Behalf Of
</B>Siddhanathan Shanmugam<BR></DIV><B>Sent:</B> Monday, March 27, 2017 2:12
PM<BR><B>To:</B> Gregory Popovitch
<DIV class=elided-text><BR><B>Subject:</B> Re: Proposal: a new implementation
for Data.List.sort and Data.List.sortBy, which has better performance
characteristics and is more laziness-friendly.<BR></DIV></FONT><BR></DIV>
<DIV class=elided-text>
<DIV></DIV>
<DIV dir=ltr>Hi Greg,<BR>
<DIV class=gmail_extra><BR>
<DIV class=gmail_quote>On Mon, Mar 27, 2017 at 10:19 AM, Gregory Popovitch
<SPAN dir=ltr><<A href="mailto:greg7mdp@gmail.com"
target=_blank>greg7mdp@gmail.com</A>></SPAN> wrote:<BR>
<BLOCKQUOTE class=gmail_quote
style="PADDING-LEFT: 1ex; BORDER-LEFT: rgb(204,204,204) 1px solid; MARGIN: 0px 0px 0px 0.8ex"><U></U>
<DIV>
<DIV dir=ltr align=left><FONT color=#000080 face=Calibri><SPAN
class=m_-3409671060788413236gmail-m_-5012754740519492618136311517-27032017>Unfortunately,
this optimization makes the sort less lazy, so doing something
like:</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT color=#000080 face=Calibri><SPAN
class=m_-3409671060788413236gmail-m_-5012754740519492618136311517-27032017></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT color=#000080 face=Calibri><SPAN
class=m_-3409671060788413236gmail-m_-5012754740519492618136311517-27032017>take 4 $
sort l</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT color=#000080 face=Calibri><SPAN
class=m_-3409671060788413236gmail-m_-5012754740519492618136311517-27032017></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT color=#000080 face=Calibri><SPAN
class=m_-3409671060788413236gmail-m_-5012754740519492618136311517-27032017>requires
more sorting of the list l with this change. I'm not sure it is a good
tradeoff.</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT color=#000080 face=Calibri><SPAN
class=m_-3409671060788413236gmail-m_-5012754740519492618136311517-27032017></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT color=#000080 face=Calibri><SPAN
class=m_-3409671060788413236gmail-m_-5012754740519492618136311517-27032017>This
can be verified with: <A
href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs"
target=_blank>https://github.com/greg7mdp/gh<WBR>c-sort/blob/master/src/sort_wi<WBR>th_trace.hs</A></SPAN></FONT></DIV></DIV></BLOCKQUOTE>
<DIV><BR></DIV>
<DIV>I think you're running without optimizations turned on. It is lazy in my
case.</DIV>
<DIV><BR></DIV>
<DIV>Also, the difference should be negligible (if any at all). Here's an
example of the list being sorted:</DIV>
<DIV><BR></DIV>
<DIV>
<DIV>[11,4,6,8,2,5,1,7,9,55,11,3]</DIV></DIV>
<DIV>...</DIV>
<DIV>[[4,11],[6,8],[2,5],[1,7,9,55]<WBR>,[3,11],[]]<BR></DIV>
<DIV>...</DIV>
<DIV>
<DIV>[[1,2,4,5,6,7,8,9,11,55],[3,<WBR>11]]</DIV>
<DIV> * 1 3</DIV>
<DIV> * 2 3</DIV>
<DIV> * 4 3</DIV>
<DIV> * 4 11</DIV>
<DIV>[1,2,3,4]</DIV></DIV>
<DIV><BR></DIV>
<DIV>The number of operations saved is only in the last merge. It's only lazy
at this step.</DIV>
<DIV><BR></DIV>
<DIV>So we save at most one traversal of the list, which is not too expensive
since our worst case bounds is O(n log n) anyway.</DIV>
<DIV><BR></DIV>
<DIV>This should mean that the asymptotic performance should be identical,
regardless of the number of comparisons saved. Of course, you do get better
constants, but I would be surprised if those constants translated to
significantly better performance for a reasonable size list.</DIV>
<DIV> </DIV>
<BLOCKQUOTE class=gmail_quote
style="PADDING-LEFT: 1ex; BORDER-LEFT: rgb(204,204,204) 1px solid; MARGIN: 0px 0px 0px 0.8ex">
<DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236gmail-m_-5012754740519492618136311517-27032017><FONT
color=#000080 face=Calibri></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236gmail-m_-5012754740519492618136311517-27032017> <FONT
color=#000080 face=Calibri>I do agree that it would be nice to have a more
serious validation test suite.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236gmail-m_-5012754740519492618136311517-27032017><FONT
color=#000080 face=Calibri></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236gmail-m_-5012754740519492618136311517-27032017><FONT
color=#000080 face=Calibri>Thanks,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236gmail-m_-5012754740519492618136311517-27032017><FONT
color=#000080 face=Calibri></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236gmail-m_-5012754740519492618136311517-27032017><FONT
color=#000080 face=Calibri>greg</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN
class=m_-3409671060788413236gmail-m_-5012754740519492618136311517-27032017></SPAN> </DIV>
<DIV dir=ltr align=left>
<HR>
</DIV>
<DIV dir=ltr align=left><FONT size=2 face=Tahoma><SPAN
class=m_-3409671060788413236gmail-><B>From:</B> <A
href="mailto:siddhanathan@gmail.com"
target=_blank>siddhanathan@gmail.com</A> [mailto:<A
href="mailto:siddhanathan@gmail.com"
target=_blank>siddhanathan@gmail.com</A><WBR>] <B>On Behalf Of
</B>Siddhanathan Shanmugam<BR></SPAN><B>Sent:</B> Monday, March 27, 2017
12:53 PM
<DIV>
<DIV class=m_-3409671060788413236gmail-h5><BR><B>To:</B> Gregory
Popovitch<BR><B>Cc:</B> Haskell Libraries<BR><B>Subject:</B> Re: Proposal: a
new implementation for Data.List.sort and Data.List.sortBy, which has better
performance characteristics and is more
laziness-friendly.<BR></DIV></DIV></FONT><BR></DIV>
<DIV>
<DIV class=m_-3409671060788413236gmail-h5>
<DIV></DIV>
<DIV dir=ltr>
<DIV>We can improve things a bit further by forcing evaluation (with seq)
along the way appropriately.<BR></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV>
<DIV>
<DIV><SPAN style="FONT-FAMILY: monospace,monospace">gregSortBy cmp [] =
[]</SPAN><BR></DIV>
<DIV><FONT face="monospace, monospace">gregSortBy cmp xs = head $ until
(null.tail) reduce (pair xs)</FONT></DIV>
<DIV><FONT face="monospace, monospace"> where</FONT></DIV>
<DIV><FONT face="monospace, monospace"> pair (x:y:t) | x `cmp`
y == GT = [y, x] : pair t</FONT></DIV>
<DIV><FONT face="monospace, monospace">
| otherwise = [x, y] :
pair t</FONT></DIV>
<DIV><FONT face="monospace, monospace"> pair [x] =
[[x]]</FONT></DIV>
<DIV><FONT face="monospace, monospace"> pair [] =
[]</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> reduce (v:w:x:y:t) =
merge v' x' `seq` merge v' x' : reduce t</FONT></DIV>
<DIV><FONT face="monospace, monospace">
where v' = merge v w
`seq` merge v w</FONT></DIV>
<DIV><FONT face="monospace, monospace">
x' = merge x y `seq` merge x y</FONT></DIV>
<DIV><FONT face="monospace, monospace">
</FONT></DIV>
<DIV><FONT face="monospace, monospace"> reduce (x:y:t) = merge
x y `seq` merge x y : reduce t</FONT></DIV>
<DIV><FONT face="monospace, monospace"> reduce xs
= xs</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> merge xs []
= xs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> merge [] ys
= ys</FONT></DIV>
<DIV><FONT face="monospace, monospace"> merge xs@(x:xs')
ys@(y:ys') </FONT></DIV>
<DIV><FONT face="monospace, monospace"> | x
`cmp` y == GT = y : merge xs ys'</FONT></DIV>
<DIV><FONT face="monospace, monospace"> |
otherwise = x : merge xs' ys</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><SPAN style="FONT-FAMILY: monospace,monospace">gSortBy cmp = mergeAll .
sequences</SPAN><BR></DIV>
<DIV><FONT face="monospace, monospace"> where</FONT></DIV>
<DIV><FONT face="monospace, monospace"> sequences
(a:b:xs)</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | a `cmp` b ==
GT = descending b [a] xs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | otherwise
= ascending b (a:) xs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> sequences xs =
[xs]</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> descending a as
(b:bs)</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | a `cmp` b ==
GT = descending b (a:as) bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> descending a as bs
= (a:as) `seq` (a:as) : sequences bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> ascending a as
(b:bs)</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | a `cmp` b /=
GT = ascending b (as . (a:)) bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> ascending a as bs
= as [a] `seq` as [a] : sequences bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> mergeAll [x] =
x</FONT></DIV>
<DIV><FONT face="monospace, monospace"> mergeAll xs =
mergeAll (mergePairs xs)</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> mergePairs (a:b:xs) =
merge a b `seq` merge a b : mergePairs xs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> mergePairs xs
= xs</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> merge as@(a:as')
bs@(b:bs')</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | a `cmp` b ==
GT = b : merge as bs'</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | otherwise
= a : merge as' bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> merge [] bs
= bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> merge as []
= as</FONT></DIV></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV>
<DIV><B>Before the change:</B></DIV>
<DIV><BR></DIV>
<DIV>
<DIV><FONT face="monospace, monospace">benchmarking random
ints/ghc</FONT></DIV>
<DIV><FONT face="monospace, monospace">time
3.687 s (3.541 s .. NaN
s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">
1.000 Rē (1.000 Rē .. 1.000
Rē)</FONT></DIV>
<DIV><FONT face="monospace, monospace">mean
3.691 s (3.669 s .. 3.705
s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">std dev
21.45 ms (0.0 s .. 24.76 ms)</FONT></DIV>
<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19%
(moderately inflated)</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace">benchmarking random
ints/greg</FONT></DIV>
<DIV><FONT face="monospace, monospace">time
2.648 s (2.482 s .. 2.822
s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">
0.999 Rē (0.998 Rē .. 1.000
Rē)</FONT></DIV>
<DIV><FONT face="monospace, monospace">mean
2.704 s (2.670 s .. 2.736
s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">std dev
52.68 ms (0.0 s .. 54.49 ms)</FONT></DIV>
<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19%
(moderately inflated)</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace">benchmarking random
ints/gSort</FONT></DIV>
<DIV><FONT face="monospace, monospace">time
2.733 s (2.682 s .. 2.758
s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">
1.000 Rē (1.000 Rē .. 1.000
Rē)</FONT></DIV>
<DIV><FONT face="monospace, monospace">mean
2.707 s (2.689 s .. 2.718
s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">std dev
16.84 ms (0.0 s .. 19.20 ms)</FONT></DIV>
<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19%
(moderately inflated)</FONT></DIV></DIV>
<DIV><BR></DIV>
<DIV><B>After the change:</B></DIV>
<DIV><BR></DIV>
<DIV>
<DIV><FONT face="monospace, monospace">benchmarking random
ints/greg</FONT></DIV>
<DIV><FONT face="monospace, monospace">time
2.576 s (2.548 s .. 2.628
s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">
1.000 Rē (1.000 Rē .. 1.000
Rē)</FONT></DIV>
<DIV><FONT face="monospace, monospace">mean
2.590 s (2.578 s .. 2.599
s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">std dev
12.99 ms (0.0 s .. 14.89 ms)</FONT></DIV>
<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19%
(moderately inflated)</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace">benchmarking random
ints/gSort</FONT></DIV>
<DIV><FONT face="monospace, monospace">time
2.538 s (2.412 s .. 2.627
s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">
1.000 Rē (0.999 Rē .. 1.000
Rē)</FONT></DIV>
<DIV><FONT face="monospace, monospace">mean
2.543 s (2.517 s .. 2.560
s)</FONT></DIV>
<DIV><FONT face="monospace, monospace">std dev
26.16 ms (0.0 s .. 30.21 ms)</FONT></DIV>
<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19%
(moderately inflated)</FONT></DIV></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV>
<DIV><BR></DIV></DIV>
<DIV class=gmail_extra><BR>
<DIV class=gmail_quote>On Sun, Mar 26, 2017 at 1:54 PM, Siddhanathan
Shanmugam <SPAN dir=ltr><<A href="mailto:siddhanathan+eml@gmail.com"
target=_blank>siddhanathan+eml@gmail.com</A>></SPAN> wrote:<BR>
<BLOCKQUOTE class=gmail_quote
style="PADDING-LEFT: 1ex; BORDER-LEFT: rgb(204,204,204) 1px solid; MARGIN: 0px 0px 0px 0.8ex">
<DIV dir=ltr>
<DIV>
<DIV>Theoretically, we could do better. We currently only exploit
monotonic runs in merge sort, but we could also exploit bitonic
runs:</DIV>
<DIV><BR></DIV>
<DIV>
<DIV><FONT face="monospace, monospace"> dlist as = as []
`seq` as []<BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> sequences [] =
[[]]</FONT></DIV>
<DIV><FONT face="monospace, monospace"> sequences [a] =
[[a]]</FONT></DIV>
<DIV><FONT face="monospace, monospace"> sequences (a:xs) =
bitonic a a (a:) xs</FONT></DIV>
<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>
<DIV><FONT face="monospace, monospace"> bitonic min max as
(b:bs)</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | b `cmp` max
/= LT = bitonic min b (as . (b:)) bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | b `cmp` min
/= GT = bitonic b max ((b:) . as) bs</FONT></DIV>
<DIV><FONT face="monospace, monospace"> | otherwise =
dlist as : sequences (b:bs)</FONT></DIV>
<DIV><FONT face="monospace, monospace"> bitonic _ _ as [] =
[dlist as]</FONT></DIV>
<DIV><BR></DIV></DIV></DIV>
<DIV><BR></DIV>
<DIV>The constant factors here might be too high to notice the difference
though.</DIV><SPAN>
<DIV><BR></DIV>
<DIV><SPAN style="FONT-SIZE: 12px"><BR></SPAN></DIV>
<DIV><SPAN style="FONT-SIZE: 12px">> However, still my version is more
laziness-friendly, i.e. it requires fewer</SPAN><BR></DIV>
<DIV><SPAN style="FONT-SIZE: 12px">> comparisons to get the</SPAN><BR
style="FONT-SIZE: 12px"><SPAN style="FONT-SIZE: 12px">> N smallest
elements of a list </SPAN><SPAN
style="FONT-SIZE: 12px">(see</SPAN><BR></DIV>> <A
style="FONT-SIZE: 12px"
href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs"
rel=noreferrer
target=_blank>https://github.com/greg7mdp/<WBR>ghc-sort/blob/master/src/sort_<WBR>with_trace.hs</A><SPAN
style="FONT-SIZE: 12px">).</SPAN>
<DIV><SPAN style="FONT-SIZE: 12px">></SPAN></DIV>
<DIV><SPAN style="FONT-SIZE: 12px">> I wonder if this might not be a
more useful trait than being able to sort</SPAN><BR></DIV>
<DIV><SPAN style="FONT-SIZE: 12px">> already sorted lists super
fast.</SPAN><BR></DIV>
<DIV><SPAN style="FONT-SIZE: 12px"><BR></SPAN></DIV></SPAN>
<DIV><SPAN style="FONT-SIZE: 12px">This comes down to a discussion of
merge sort vs natural merge sort.</SPAN></DIV>
<DIV><BR></DIV>
<DIV>Data.List.sort is an implementation of a variant of merge sort called
natural merge sort. The algorithm is linearithmic in the worst case, but
linear in the best case (already sorted list).</DIV>
<DIV><BR></DIV>
<DIV><BR></DIV></DIV>
<DIV class=m_-3409671060788413236gmail-m_-5012754740519492618HOEnZb>
<DIV class=m_-3409671060788413236gmail-m_-5012754740519492618h5>
<DIV class=gmail_extra><BR>
<DIV class=gmail_quote>On Sun, Mar 26, 2017 at 10:47 AM, Gregory Popovitch
<SPAN dir=ltr><<A href="mailto:greg7mdp@gmail.com"
target=_blank>greg7mdp@gmail.com</A>></SPAN> wrote:<BR>
<BLOCKQUOTE class=gmail_quote
style="PADDING-LEFT: 1ex; BORDER-LEFT: rgb(204,204,204) 1px solid; MARGIN: 0px 0px 0px 0.8ex">Thanks
again @Siddhanathan! Looks like your gSort fixes the main issue
with<BR>Data.List.sort().<BR><BR>I have updated the test programs in <A
href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer
target=_blank>https://github.com/greg7mdp/gh<WBR>c-sort</A>
to<BR>include your new version.<BR><BR>Here are the results (your new
version looks like a definite improvement vs<BR>the current GHC
one):<BR><BR>input
GHC sort
My Orig proposal
gSort<BR>------------------------------<WBR>------------------------------<WBR>----------------<BR>---<BR>sorted
ints (ascending) 151
456
148<BR>sorted ints (descending)
152
466
155<BR>random ints
2732
2006
2004<BR>random strings
6564 5549
5528<BR><BR><BR>So replacing the current GHC version with gSort is
a no brainer, as it is<BR>better in all regards.<BR><BR>However, still
my version is more laziness-friendly, i.e. it requires
fewer<BR>comparisons to get the<BR>N smallest elements of a list
(see<BR><A
href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs"
rel=noreferrer
target=_blank>https://github.com/greg7mdp/gh<WBR>c-sort/blob/master/src/sort_wi<WBR>th_trace.hs</A>).<BR><BR>I
wonder if this might not be a more useful trait than being able to
sort<BR>already sorted lists super
fast.<BR><SPAN><BR>Thanks,<BR><BR>greg<BR><BR>______________________________<WBR>__<BR><BR>From:
<A href="mailto:siddhanathan@gmail.com"
target=_blank>siddhanathan@gmail.com</A> [mailto:<A
href="mailto:siddhanathan@gmail.com"
target=_blank>siddhanathan@gmail.com</A><WBR>] On Behalf
Of<BR>Siddhanathan Shanmugam<BR></SPAN>Sent: Sunday, March 26, 2017 1:05
PM<BR><SPAN>To: Gregory Popovitch<BR>Cc: Haskell Libraries<BR>Subject:
Re: Proposal: a new implementation for Data.List.sort
and<BR>Data.List.sortBy, which has better performance characteristics
and is more<BR>laziness-friendly.<BR><BR><BR></SPAN>
<DIV>
<DIV
class=m_-3409671060788413236gmail-m_-5012754740519492618m_6283889194055629001h5>Interesting.
You are right, performance for sorting random lists has<BR>priority over
performance for sorting already-sorted lists.<BR><BR>Ignore the numbers
for my previous version. Can you compare GHC's sort, your<BR>proposal,
and gSort below?<BR><BR><BR>gSort :: Ord a => [a] -> [a]<BR>gSort
= gSortBy compare<BR>gSortBy cmp = mergeAll . sequences<BR>
where<BR> sequences (a:b:xs)<BR> | a
`cmp` b == GT = descending b [a] xs<BR> |
otherwise = ascending b (a:)
xs<BR> sequences xs = [xs]<BR><BR><BR>
descending a as (b:bs)<BR> | a `cmp` b == GT =
descending b (a:as) bs<BR> descending a as bs =
(a:as) : sequences bs<BR><BR><BR> ascending a as
(b:bs)<BR> | a `cmp` b /= GT = ascending b (\ys
-> as (a:ys)) bs<BR> ascending a as bs = as
[a] `seq` as [a] : sequences bs<BR><BR><BR> mergeAll [x] =
x<BR> mergeAll xs = mergeAll (mergePairs
xs)<BR><BR><BR> mergePairs (a:b:xs) = merge a b :
mergePairs xs<BR> mergePairs xs =
xs<BR><BR><BR> merge as@(a:as') bs@(b:bs')<BR>
| a `cmp` b == GT = b : merge as bs'<BR>
| otherwise = a : merge as'
bs<BR> merge [] bs =
bs<BR> merge as [] =
as<BR><BR><BR>Thanks,<BR>Sid<BR><BR><BR>On Sun, Mar 26, 2017 at 9:19 AM,
Gregory Popovitch <<A href="mailto:greg7mdp@gmail.com"
target=_blank>greg7mdp@gmail.com</A>><BR>wrote:<BR><BR><BR>
Thank you @Siddhanathan! I welcome any improvement
you may make, as<BR>I said I<BR> am very far
from a Haskell expert.<BR><BR> I just tested
your change with my test project<BR> (<A
href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A><BR></DIV></DIV><<A
href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A>> )<BR>
<DIV>
<DIV
class=m_-3409671060788413236gmail-m_-5012754740519492618m_6283889194055629001h5>
and here are my results (mean times in
ms):<BR><BR> input
GHC sort
Orig proposal<BR>your<BR>
change<BR><BR>------------------------------<WBR>------------------------------<WBR>----------------<BR>
---<BR> sorted ints
(ascending) 153
467<BR>139<BR> sorted
ints (descending) 152
472<BR>599<BR>
random ints
2824
2077<BR>2126<BR> random strings
6564
5613<BR>5983<BR><BR>
Your change is a definite improvement for sorted integers
in<BR>ascending<BR> order, but is worse for
other cases.<BR><BR> Is there a real need to
optimize the sort for already sorted list?<BR>Of course<BR>
it should not be a degenerate<BR>
case and take longer than sorting random numbers, but this is
not<BR>the case<BR> here. Sorting already
sorted<BR> lists is, even with my version,
over 4 times faster than sorting<BR>random<BR>
lists. This sounds perfectly<BR>
acceptable to me, and I feel that trying to optimize this
specific<BR>case<BR> further, if it comes at
the<BR> detriment of the general case, is not
desirable.<BR><BR> Thanks,<BR><BR>
greg<BR><BR>
______________________________<WBR>__<BR><BR>
From: <A href="mailto:siddhanathan@gmail.com"
target=_blank>siddhanathan@gmail.com</A> [mailto:<A
href="mailto:siddhanathan@gmail.com"
target=_blank>siddhanathan@gmail.com</A><WBR>] On<BR>Behalf Of<BR>
Siddhanathan Shanmugam<BR>
Sent: Sunday, March 26, 2017 11:41 AM<BR>
To: Gregory Popovitch<BR> Cc: Haskell
Libraries<BR> Subject: Re: Proposal: a new
implementation for Data.List.sort and<BR>
Data.List.sortBy, which has better performance characteristics and<BR>is
more<BR>
laziness-friendly.<BR><BR><BR><BR> Thank you!
This identifies a space leak in base which went unnoticed<BR>for
7<BR> years.<BR><BR>
Your implementation can be improved further. Instead of
splitting<BR>into<BR> pairs, you could
instead split into lists of sorted sublists by<BR>replacing<BR>
the pairs function with the following<BR><BR>
pair = foldr f []<BR>
where<BR>
f x [] = [[x]]<BR>
f x (y:ys)<BR>
| x `cmp` head y == LT =
(x:y):ys<BR>
| otherwise =
[x]:y:ys<BR><BR> This should give you the
same performance improvements for sorting<BR>random<BR>
lists, but better performance while sorting ascending
lists.<BR><BR> The version in base takes it
one step further by using a DList to<BR>handle the<BR>
descending case efficiently as well, except there's a
space leak<BR>right now<BR> because of which
it is slower.<BR><BR> On Sun, Mar 26, 2017 at
7:21 AM, Gregory Popovitch<BR><<A href="mailto:greg7mdp@gmail.com"
target=_blank>greg7mdp@gmail.com</A>><BR>
wrote:<BR><BR><BR><BR>
Motivation:<BR>
----------<BR><BR>
Data.List.sort is a very important functionality in
Haskell.<BR>I<BR> believe that<BR>
the proposed
implementation is:<BR><BR>
- significantly faster than the current implementation
on<BR>unsorted<BR> lists,<BR>
typically 14% to 27%
faster<BR> - more
laziness-friendly, i.e.:<BR>
take 3 $ sort l<BR>
will require significantly
less comparisons than the<BR>current<BR>
implementation<BR><BR>
Proposed Implementation<BR>
-----------------------<BR><BR>
sort :: (Ord a) => [a] -> [a]<BR>
sort = sortBy
compare<BR><BR>
sortBy cmp [] = []<BR>
sortBy cmp xs = head $ until (null.tail) reduce (pair
xs)<BR>
where<BR>
pair (x:y:t) | x `cmp` y == GT = [y, x] : pair t<BR>
| otherwise
= [x, y] : pair t<BR>
pair [x] = [[x]]<BR>
pair [] =
[]<BR><BR>
reduce (v:w:x:y:t) = merge v' x' : reduce t<BR>
where v' =
merge v w<BR>
x' = merge x y<BR><BR>
reduce
(x:y:t) = merge x y : reduce t<BR>
reduce xs =
xs<BR><BR>
merge xs [] =
xs<BR>
merge [] ys =
ys<BR>
merge xs@(x:xs') ys@(y:ys')<BR>
| x `cmp` y ==
GT = y : merge xs ys'<BR>
| otherwise
= x : merge xs' ys<BR><BR><BR>
Effect and Interactions<BR>
-----------------------<BR><BR>
I have a stack project with a criterion test for this
new<BR> implementation,<BR>
available at <A
href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer
target=_blank>https://github.com/greg7mdp/gh<WBR>c-sort</A><BR><<A
href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A>><BR><BR>
<<A href="https://github.com/greg7mdp/ghc-sort"
rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A><BR><<A
href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A>> >
.<BR> I ran the
tests on an Ubuntu 14.0.2 VM and GHC 8.0.2, and<BR>had the<BR>
following<BR>
results:<BR><BR>
- sorting of random lists of integers is 27%
faster<BR> -
sorting of random lists of strings is 14% faster<BR>
- sorting of already sorted lists is
significantly slower,<BR>but still<BR>
much<BR> faster
than sorting random lists<BR>
- proposed version is more laziness friendly. For
example<BR>this<BR> version of<BR>
sortBy requires 11
comparisons to find<BR>
the smallest element of a 15 element list, while
the<BR>default<BR>
Data.List.sortBy requires 15 comparisons.<BR>
(see<BR><BR><BR><A
href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs"
rel=noreferrer
target=_blank>https://github.com/greg7mdp/gh<WBR>c-sort/blob/master/src/sort_wi<WBR>th_trace.hs</A><BR><<A
href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs"
rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort/blob/master/src/sort_w<WBR>ith_trace.hs</A>><BR><BR><<A
href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs"
rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort/blob/master/src/sort_w<WBR>ith_trace.hs</A><BR><<A
href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs"
rel=noreferrer
target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort/blob/master/src/sort_w<WBR>ith_trace.hs</A>>
><BR>)<BR><BR><BR><BR>
Test results<BR>
------------<BR><BR>
Criterion output (descending/ascending results are
for<BR>already<BR> sorted<BR>
lists).<BR>
I barely understand what
Criterion does, and I am puzzled<BR>with the<BR>
various<BR>
"T" output - maybe there is a bug in my bench code:<BR><BR>
vagrant@vagrant-ubuntu-trusty-<WBR>64:/vagrant$ stack
exec<BR>ghc-sort<BR>
benchmarking ascending ints/ghc<BR>
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT<WBR>TTTTTTTtime<BR>160.6 ms<BR>
(153.4<BR>
ms .. 167.8 ms)<BR>
0.997 Rē (0.986 Rē .. 1.000
Rē)<BR>
mean 161.7
ms (158.3 ms .. 165.9 ms)<BR>
std dev
5.210 ms (3.193 ms .. 7.006 ms)<BR>
variance introduced by
outliers: 12% (moderately inflated)<BR><BR>
benchmarking ascending ints/greg<BR>
TTTTTTTTTTTTTTTTtime
473.8 ms (398.6 ms ..<BR>554.9<BR>
ms)<BR>
0.996 Rē (0.987 Rē .. 1.000 Rē)<BR>
mean
466.2 ms
(449.0 ms .. 475.0 ms)<BR>
std dev
14.94 ms (0.0 s .. 15.29 ms)<BR>
variance introduced by outliers: 19%
(moderately inflated)<BR><BR>
benchmarking descending ints/ghc<BR>
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT<WBR>TTTTTTTtime<BR>165.1 ms<BR>
(148.2<BR>
ms .. 178.2 ms)<BR>
0.991 Rē (0.957 Rē .. 1.000
Rē)<BR>
mean 158.7
ms (154.0 ms .. 164.3 ms)<BR>
std dev
7.075 ms (4.152 ms .. 9.903 ms)<BR>
variance introduced by
outliers: 12% (moderately inflated)<BR><BR>
benchmarking descending ints/greg<BR>
TTTTTTTTTTTTTTTTtime
471.7 ms (419.8 ms ..<BR>508.3<BR>
ms)<BR>
0.999 Rē (0.995 Rē .. 1.000 Rē)<BR>
mean
476.0 ms
(467.5 ms .. 480.0 ms)<BR>
std dev
7.447 ms (67.99 as .. 7.865 ms)<BR>
variance introduced by
outliers: 19% (moderately inflated)<BR><BR>
benchmarking random ints/ghc<BR>
TTTTTTTTTTTTTTTTtime
2.852 s (2.564 s ..<BR>3.019 s)<BR>
0.999 Rē
(0.997 Rē .. 1.000 Rē)<BR>
mean
2.812 s (2.785 s .. 2.838 s)<BR>
std dev
44.06 ms (543.9 as ..
44.97 ms)<BR>
variance introduced by outliers: 19% (moderately inflated)<BR><BR>
benchmarking random
ints/greg<BR>
TTTTTTTTTTTTTTTTtime
2.032 s (1.993 s ..<BR>2.076 s)<BR>
1.000 Rē
(1.000 Rē .. 1.000 Rē)<BR>
mean
2.028 s (2.019 s .. 2.033 s)<BR>
std dev
7.832 ms (0.0 s .. 8.178
ms)<BR> variance
introduced by outliers: 19% (moderately inflated)<BR><BR>
benchmarking
shakespeare/ghc<BR>
TTTTTTTTTTTTTTTTtime
6.504 s (6.391 s ..<BR>6.694
s)<BR>
1.000 Rē (1.000 Rē .. 1.000 Rē)<BR>
mean
6.499 s (6.468 s
.. 6.518 s)<BR>
std dev 28.85 ms
(0.0 s .. 32.62 ms)<BR>
variance introduced by outliers: 19% (moderately
inflated)<BR><BR>
benchmarking shakespeare/greg<BR>
TTTTTTTTTTTTTTTTtime
5.560 s (5.307 s
..<BR>5.763 s)<BR>
1.000 Rē (0.999 Rē .. 1.000 Rē)<BR>
mean
5.582 s
(5.537 s .. 5.607 s)<BR>
std dev 39.30
ms (0.0 s .. 43.49 ms)<BR>
variance introduced by outliers: 19% (moderately
inflated)<BR><BR><BR>
Costs and Drawbacks<BR>
-------------------<BR><BR>
The only cost I see is the reduced
performance when sorting<BR>already<BR>
sorted<BR> lists.
However, since this remains quite efficient, indeed<BR>over 4<BR>
times<BR>
faster than sorting unsorted lists, I think it is
an<BR>acceptable<BR> tradeoff.<BR><BR>
Final note<BR>
----------<BR><BR>
My Haskell is very rusty. I worked on this a couple years<BR>ago
when I<BR> was<BR>
learning Haskell, and meant to propose it to
the Haskell<BR>community,<BR> but<BR>
never got to it at the
time.<BR><BR>
______________________________<WBR>_________________<BR>
Libraries mailing
list<BR> <A
href="mailto:Libraries@haskell.org"
target=_blank>Libraries@haskell.org</A><BR>
<A
href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries"
rel=noreferrer
target=_blank>http://mail.haskell.org/cgi-bi<WBR>n/mailman/listinfo/libraries</A><BR><<A
href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries"
rel=noreferrer
target=_blank>http://mail.haskell.org/cgi-b<WBR>in/mailman/listinfo/libraries</A>><BR><BR></DIV></DIV>
<<A
href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries"
rel=noreferrer
target=_blank>http://mail.haskell.org/cgi-b<WBR>in/mailman/listinfo/libraries</A><BR><<A
href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries"
rel=noreferrer
target=_blank>http://mail.haskell.org/cgi-b<WBR>in/mailman/listinfo/libraries</A>>
><BR><BR><BR><BR><BR><BR><BR><BR><BR></BLOCKQUOTE></DIV><BR></DIV></DIV></DIV></BLOCKQUOTE></DIV><BR></DIV></DIV></DIV></DIV></BLOCKQUOTE></DIV><BR></DIV></DIV></DIV></DIV></BLOCKQUOTE></DIV><BR></DIV></DIV></BODY></HTML>