<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>

<META name=GENERATOR content="MSHTML 11.00.10570.1001"></HEAD>

<BODY>

<DIV dir=ltr align=left><FONT color=#000080 face=Calibri><SPAN 

class=070280421-27032017>Pretty cool by the way, we now have a 31% improvement 

for sorting lists of random integers vs the current 

Data.List.sortBy.</SPAN></FONT></DIV>

<DIV dir=ltr align=left><FONT color=#000080 face=Calibri><SPAN 

class=070280421-27032017></SPAN></FONT> </DIV>

<DIV dir=ltr align=left><FONT color=#000080 face=Calibri><SPAN 

class=070280421-27032017>greg</SPAN></FONT></DIV><BR>

<DIV lang=en-us class=OutlookMessageHeader dir=ltr align=left>

<HR tabIndex=-1>

<FONT size=2 face=Tahoma><B>From:</B> siddhanathan@gmail.com 

[mailto:siddhanathan@gmail.com] <B>On Behalf Of </B>Siddhanathan 

Shanmugam<BR><B>Sent:</B> Monday, March 27, 2017 12:53 PM<BR><B>To:</B> Gregory 

Popovitch<BR><B>Cc:</B> Haskell Libraries<BR><B>Subject:</B> Re: Proposal: a new 

implementation for Data.List.sort and Data.List.sortBy, which has better 

performance characteristics and is more laziness-friendly.<BR></FONT><BR></DIV>

<DIV></DIV>

<DIV dir=ltr>

<DIV>We can improve things a bit further by forcing evaluation (with seq) along 

the way appropriately.<BR></DIV>

<DIV><BR></DIV>

<DIV><BR></DIV>

<DIV><BR></DIV>

<DIV>

<DIV><SPAN style="FONT-FAMILY: monospace,monospace">gregSortBy cmp [] = 

[]</SPAN><BR></DIV>

<DIV><FONT face="monospace, monospace">gregSortBy cmp xs = head $ until 

(null.tail) reduce (pair xs)</FONT></DIV>

<DIV><FONT face="monospace, monospace">  where</FONT></DIV>

<DIV><FONT face="monospace, monospace">    pair (x:y:t) | x `cmp` y == 

GT  = [y, x] : pair t</FONT></DIV>

<DIV><FONT face="monospace, monospace">            

     | otherwise        = [x, y] : pair 

t</FONT></DIV>

<DIV><FONT face="monospace, monospace">    pair [x] = 

[[x]]</FONT></DIV>

<DIV><FONT face="monospace, monospace">    pair []  = 

[]</FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace">    reduce (v:w:x:y:t) = merge 

v' x' `seq` merge v' x' : reduce t</FONT></DIV>

<DIV><FONT face="monospace, monospace">            

             where v' = merge v w `seq` merge 

v w</FONT></DIV>

<DIV><FONT face="monospace, monospace">            

                   x' = merge 

x y `seq` merge x y</FONT></DIV>

<DIV><FONT face="monospace, monospace">            

             </FONT></DIV>

<DIV><FONT face="monospace, monospace">    reduce (x:y:t) = merge x y 

`seq` merge x y : reduce t</FONT></DIV>

<DIV><FONT face="monospace, monospace">    reduce xs     

 = xs</FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace">    merge xs []     

      = xs</FONT></DIV>

<DIV><FONT face="monospace, monospace">    merge []  ys   

       = ys</FONT></DIV>

<DIV><FONT face="monospace, monospace">    merge xs@(x:xs') 

ys@(y:ys') </FONT></DIV>

<DIV><FONT face="monospace, monospace">         | x 

`cmp` y == GT  = y : merge xs  ys'</FONT></DIV>

<DIV><FONT face="monospace, monospace">         | 

otherwise        = x : merge xs' ys</FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><SPAN style="FONT-FAMILY: monospace,monospace">gSortBy cmp = mergeAll . 

sequences</SPAN><BR></DIV>

<DIV><FONT face="monospace, monospace">  where</FONT></DIV>

<DIV><FONT face="monospace, monospace">    sequences 

(a:b:xs)</FONT></DIV>

<DIV><FONT face="monospace, monospace">      | a `cmp` b == GT = 

descending b [a]  xs</FONT></DIV>

<DIV><FONT face="monospace, monospace">      | otherwise   

    = ascending  b (a:) xs</FONT></DIV>

<DIV><FONT face="monospace, monospace">    sequences xs = 

[xs]</FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace">    descending a as 

(b:bs)</FONT></DIV>

<DIV><FONT face="monospace, monospace">      | a `cmp` b == GT = 

descending b (a:as) bs</FONT></DIV>

<DIV><FONT face="monospace, monospace">    descending a as bs  = 

(a:as) `seq` (a:as) : sequences bs</FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace">    ascending a as 

(b:bs)</FONT></DIV>

<DIV><FONT face="monospace, monospace">      | a `cmp` b /= GT = 

ascending b (as . (a:)) bs</FONT></DIV>

<DIV><FONT face="monospace, monospace">    ascending a as bs   = 

as [a] `seq` as [a] : sequences bs</FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace">    mergeAll [x] = 

x</FONT></DIV>

<DIV><FONT face="monospace, monospace">    mergeAll xs  = 

mergeAll (mergePairs xs)</FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace">    mergePairs (a:b:xs) = merge 

a b `seq` merge a b : mergePairs xs</FONT></DIV>

<DIV><FONT face="monospace, monospace">    mergePairs xs     

  = xs</FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace">    merge as@(a:as') 

bs@(b:bs')</FONT></DIV>

<DIV><FONT face="monospace, monospace">      | a `cmp` b == GT = 

b : merge as  bs'</FONT></DIV>

<DIV><FONT face="monospace, monospace">      | otherwise   

    = a : merge as' bs</FONT></DIV>

<DIV><FONT face="monospace, monospace">    merge [] bs     

    = bs</FONT></DIV>

<DIV><FONT face="monospace, monospace">    merge as []     

    = as</FONT></DIV></DIV>

<DIV><BR></DIV>

<DIV><BR></DIV>

<DIV><BR></DIV>

<DIV><BR></DIV>

<DIV><B>Before the change:</B></DIV>

<DIV><BR></DIV>

<DIV>

<DIV><FONT face="monospace, monospace">benchmarking random ints/ghc</FONT></DIV>

<DIV><FONT face="monospace, monospace">time           

      3.687 s    (3.541 s .. NaN s)</FONT></DIV>

<DIV><FONT face="monospace, monospace">            

         1.000 R²   (1.000 R² .. 1.000 

R²)</FONT></DIV>

<DIV><FONT face="monospace, monospace">mean           

      3.691 s    (3.669 s .. 3.705 s)</FONT></DIV>

<DIV><FONT face="monospace, monospace">std dev         

     21.45 ms   (0.0 s .. 24.76 ms)</FONT></DIV>

<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19% 

(moderately inflated)</FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace">benchmarking random 

ints/greg</FONT></DIV>

<DIV><FONT face="monospace, monospace">time           

      2.648 s    (2.482 s .. 2.822 s)</FONT></DIV>

<DIV><FONT face="monospace, monospace">            

         0.999 R²   (0.998 R² .. 1.000 

R²)</FONT></DIV>

<DIV><FONT face="monospace, monospace">mean           

      2.704 s    (2.670 s .. 2.736 s)</FONT></DIV>

<DIV><FONT face="monospace, monospace">std dev         

     52.68 ms   (0.0 s .. 54.49 ms)</FONT></DIV>

<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19% 

(moderately inflated)</FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace">benchmarking random 

ints/gSort</FONT></DIV>

<DIV><FONT face="monospace, monospace">time           

      2.733 s    (2.682 s .. 2.758 s)</FONT></DIV>

<DIV><FONT face="monospace, monospace">            

         1.000 R²   (1.000 R² .. 1.000 

R²)</FONT></DIV>

<DIV><FONT face="monospace, monospace">mean           

      2.707 s    (2.689 s .. 2.718 s)</FONT></DIV>

<DIV><FONT face="monospace, monospace">std dev         

     16.84 ms   (0.0 s .. 19.20 ms)</FONT></DIV>

<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19% 

(moderately inflated)</FONT></DIV></DIV>

<DIV><BR></DIV>

<DIV><B>After the change:</B></DIV>

<DIV><BR></DIV>

<DIV>

<DIV><FONT face="monospace, monospace">benchmarking random 

ints/greg</FONT></DIV>

<DIV><FONT face="monospace, monospace">time           

      2.576 s    (2.548 s .. 2.628 s)</FONT></DIV>

<DIV><FONT face="monospace, monospace">            

         1.000 R²   (1.000 R² .. 1.000 

R²)</FONT></DIV>

<DIV><FONT face="monospace, monospace">mean           

      2.590 s    (2.578 s .. 2.599 s)</FONT></DIV>

<DIV><FONT face="monospace, monospace">std dev         

     12.99 ms   (0.0 s .. 14.89 ms)</FONT></DIV>

<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19% 

(moderately inflated)</FONT></DIV>

<DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

<DIV><FONT face="monospace, monospace">benchmarking random 

ints/gSort</FONT></DIV>

<DIV><FONT face="monospace, monospace">time           

      2.538 s    (2.412 s .. 2.627 s)</FONT></DIV>

<DIV><FONT face="monospace, monospace">            

         1.000 R²   (0.999 R² .. 1.000 

R²)</FONT></DIV>

<DIV><FONT face="monospace, monospace">mean           

      2.543 s    (2.517 s .. 2.560 s)</FONT></DIV>

<DIV><FONT face="monospace, monospace">std dev         

     26.16 ms   (0.0 s .. 30.21 ms)</FONT></DIV>

<DIV><FONT face="monospace, monospace">variance introduced by outliers: 19% 

(moderately inflated)</FONT></DIV></DIV>

<DIV><BR></DIV>

<DIV><BR></DIV>

<DIV><BR></DIV>

<DIV><BR></DIV></DIV>

<DIV class=gmail_extra><BR>

<DIV class=gmail_quote>On Sun, Mar 26, 2017 at 1:54 PM, Siddhanathan Shanmugam 

<SPAN dir=ltr><<A href="mailto:siddhanathan+eml@gmail.com" 

target=_blank>siddhanathan+eml@gmail.com</A>></SPAN> wrote:<BR>

<BLOCKQUOTE class=gmail_quote 

style="PADDING-LEFT: 1ex; BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex">

  <DIV dir=ltr>

  <DIV>

  <DIV>Theoretically, we could do better. We currently only exploit monotonic 

  runs in merge sort, but we could also exploit bitonic runs:</DIV>

  <DIV><BR></DIV>

  <DIV>

  <DIV><FONT face="monospace, monospace">    dlist as = as [] `seq` as 

  []<BR></FONT></DIV>

  <DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

  <DIV><FONT face="monospace, monospace">    sequences [] = 

  [[]]</FONT></DIV>

  <DIV><FONT face="monospace, monospace">    sequences [a] = 

  [[a]]</FONT></DIV>

  <DIV><FONT face="monospace, monospace">    sequences (a:xs) = 

  bitonic a a (a:) xs</FONT></DIV>

  <DIV><FONT face="monospace, monospace"><BR></FONT></DIV>

  <DIV><FONT face="monospace, monospace">    bitonic min max as 

  (b:bs)</FONT></DIV>

  <DIV><FONT face="monospace, monospace">      | b `cmp` max /= 

  LT = bitonic min b   (as . (b:)) bs</FONT></DIV>

  <DIV><FONT face="monospace, monospace">      | b `cmp` min /= 

  GT = bitonic b   max ((b:) . as) bs</FONT></DIV>

  <DIV><FONT face="monospace, monospace">      | otherwise = 

  dlist as : sequences (b:bs)</FONT></DIV>

  <DIV><FONT face="monospace, monospace">    bitonic _ _ as [] = 

  [dlist as]</FONT></DIV>

  <DIV><BR></DIV></DIV></DIV>

  <DIV><BR></DIV>

  <DIV>The constant factors here might be too high to notice the difference 

  though.</DIV><SPAN>

  <DIV><BR></DIV>

  <DIV><SPAN style="FONT-SIZE: 12px"><BR></SPAN></DIV>

  <DIV><SPAN style="FONT-SIZE: 12px">> However, still my version is more 

  laziness-friendly, i.e. it requires fewer</SPAN><BR></DIV>

  <DIV><SPAN style="FONT-SIZE: 12px">> comparisons to get the</SPAN><BR 

  style="FONT-SIZE: 12px"><SPAN style="FONT-SIZE: 12px">> N smallest elements 

  of a list </SPAN><SPAN 

  style="FONT-SIZE: 12px">(see</SPAN><BR></DIV>> <A 

  style="FONT-SIZE: 12px" 

  href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs" 

  rel=noreferrer 

  target=_blank>https://github.com/greg7mdp/<WBR>ghc-sort/blob/master/src/sort_<WBR>with_trace.hs</A><SPAN 

  style="FONT-SIZE: 12px">).</SPAN>

  <DIV><SPAN style="FONT-SIZE: 12px">></SPAN></DIV>

  <DIV><SPAN style="FONT-SIZE: 12px">> I wonder if this might not be a more 

  useful trait than being able to sort</SPAN><BR></DIV>

  <DIV><SPAN style="FONT-SIZE: 12px">> already sorted lists super 

  fast.</SPAN><BR></DIV>

  <DIV><SPAN style="FONT-SIZE: 12px"><BR></SPAN></DIV></SPAN>

  <DIV><SPAN style="FONT-SIZE: 12px">This comes down to a discussion of merge 

  sort vs natural merge sort.</SPAN></DIV>

  <DIV><BR></DIV>

  <DIV>Data.List.sort is an implementation of a variant of merge sort called 

  natural merge sort. The algorithm is linearithmic in the worst case, but 

  linear in the best case (already sorted list).</DIV>

  <DIV><BR></DIV>

  <DIV><BR></DIV></DIV>

  <DIV class=HOEnZb>

  <DIV class=h5>

  <DIV class=gmail_extra><BR>

  <DIV class=gmail_quote>On Sun, Mar 26, 2017 at 10:47 AM, Gregory Popovitch 

  <SPAN dir=ltr><<A href="mailto:greg7mdp@gmail.com" 

  target=_blank>greg7mdp@gmail.com</A>></SPAN> wrote:<BR>

  <BLOCKQUOTE class=gmail_quote 

  style="PADDING-LEFT: 1ex; BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex">Thanks 

    again @Siddhanathan! Looks like your gSort fixes the main issue 

    with<BR>Data.List.sort().<BR><BR>I have updated the test programs in <A 

    href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer 

    target=_blank>https://github.com/greg7mdp/gh<WBR>c-sort</A> to<BR>include 

    your new version.<BR><BR>Here are the results (your new version looks like a 

    definite improvement vs<BR>the current GHC one):<BR><BR>input    

                        GHC 

    sort         My Orig proposal    

     gSort<BR>------------------------------<WBR>------------------------------<WBR>----------------<BR>---<BR>sorted 

    ints (ascending)      151          

         456              

        148<BR>sorted ints (descending)     152  

                 466      

                155<BR>random ints    

                 2732      

            2006            

         2004<BR>random strings          

        6564              

    5549                

     5528<BR><BR><BR>So replacing the current GHC version with gSort is a 

    no brainer, as it is<BR>better in all regards.<BR><BR>However, still my 

    version is more laziness-friendly, i.e. it requires fewer<BR>comparisons to 

    get the<BR>N smallest elements of a list (see<BR><A 

    href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs" 

    rel=noreferrer 

    target=_blank>https://github.com/greg7mdp/gh<WBR>c-sort/blob/master/src/sort_wi<WBR>th_trace.hs</A>).<BR><BR>I 

    wonder if this might not be a more useful trait than being able to 

    sort<BR>already sorted lists super 

    fast.<BR><SPAN><BR>Thanks,<BR><BR>greg<BR><BR>______________________________<WBR>__<BR><BR>From: 

    <A href="mailto:siddhanathan@gmail.com" 

    target=_blank>siddhanathan@gmail.com</A> [mailto:<A 

    href="mailto:siddhanathan@gmail.com" 

    target=_blank>siddhanathan@gmail.com</A><WBR>] On Behalf Of<BR>Siddhanathan 

    Shanmugam<BR></SPAN>Sent: Sunday, March 26, 2017 1:05 PM<BR><SPAN>To: 

    Gregory Popovitch<BR>Cc: Haskell Libraries<BR>Subject: Re: Proposal: a new 

    implementation for Data.List.sort and<BR>Data.List.sortBy, which has better 

    performance characteristics and is 

    more<BR>laziness-friendly.<BR><BR><BR></SPAN>

    <DIV>

    <DIV class=m_6283889194055629001h5>Interesting. You are right, performance 

    for sorting random lists has<BR>priority over performance for sorting 

    already-sorted lists.<BR><BR>Ignore the numbers for my previous version. Can 

    you compare GHC's sort, your<BR>proposal, and gSort below?<BR><BR><BR>gSort 

    :: Ord a => [a] -> [a]<BR>gSort = gSortBy compare<BR>gSortBy cmp = 

    mergeAll . sequences<BR>  where<BR>    sequences 

    (a:b:xs)<BR>      | a `cmp` b == GT = descending b [a]  

    xs<BR>      | otherwise       = 

    ascending  b (a:) xs<BR>    sequences xs = 

    [xs]<BR><BR><BR>    descending a as (b:bs)<BR>      

    | a `cmp` b == GT = descending b (a:as) bs<BR>    descending a as 

    bs  = (a:as) : sequences bs<BR><BR><BR>    ascending a as 

    (b:bs)<BR>      | a `cmp` b /= GT = ascending b (\ys -> as 

    (a:ys)) bs<BR>    ascending a as bs   = as [a] `seq` as 

    [a] : sequences bs<BR><BR><BR>    mergeAll [x] = x<BR>  

      mergeAll xs  = mergeAll (mergePairs xs)<BR><BR><BR>    

    mergePairs (a:b:xs) = merge a b : mergePairs xs<BR>    mergePairs 

    xs       = xs<BR><BR><BR>    merge as@(a:as') 

    bs@(b:bs')<BR>      | a `cmp` b == GT = b : merge as  

    bs'<BR>      | otherwise       = a : 

    merge as' bs<BR>    merge [] bs         = 

    bs<BR>    merge as []         = 

    as<BR><BR><BR>Thanks,<BR>Sid<BR><BR><BR>On Sun, Mar 26, 2017 at 9:19 AM, 

    Gregory Popovitch <<A href="mailto:greg7mdp@gmail.com" 

    target=_blank>greg7mdp@gmail.com</A>><BR>wrote:<BR><BR><BR>    

        Thank you @Siddhanathan! I welcome any improvement you may 

    make, as<BR>I said I<BR>        am very far from a 

    Haskell expert.<BR><BR>        I just tested your change 

    with my test project<BR>        (<A 

    href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer 

    target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A><BR></DIV></DIV><<A 

    href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer 

    target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A>> )<BR>

    <DIV>

    <DIV class=m_6283889194055629001h5>        and here are 

    my results (mean times in ms):<BR><BR>        

    input                    

        GHC sort          Orig 

    proposal<BR>your<BR>        

    change<BR><BR>------------------------------<WBR>------------------------------<WBR>----------------<BR>  

          ---<BR>        sorted ints 

    (ascending)      153            

       467<BR>139<BR>        sorted ints 

    (descending)     152            

       472<BR>599<BR>        random ints  

                   2824    

              2077<BR>2126<BR>      

      random strings              

    6564              

    5613<BR>5983<BR><BR>        Your change is a definite 

    improvement for sorted integers in<BR>ascending<BR>      

      order, but is worse for other cases.<BR><BR>      

      Is there a real need to optimize the sort for already sorted 

    list?<BR>Of course<BR>        it should not be a 

    degenerate<BR>        case and take longer than sorting 

    random numbers, but this is not<BR>the case<BR>        

    here. Sorting already sorted<BR>        lists is, even 

    with my version, over 4 times faster than sorting<BR>random<BR>    

        lists. This sounds perfectly<BR>        

    acceptable to me, and I feel that trying to optimize this 

    specific<BR>case<BR>        further, if it comes at 

    the<BR>        detriment of the general case, is not 

    desirable.<BR><BR>        Thanks,<BR><BR>    

        greg<BR><BR>        

    ______________________________<WBR>__<BR><BR>        

    From: <A href="mailto:siddhanathan@gmail.com" 

    target=_blank>siddhanathan@gmail.com</A> [mailto:<A 

    href="mailto:siddhanathan@gmail.com" 

    target=_blank>siddhanathan@gmail.com</A><WBR>] On<BR>Behalf Of<BR>  

          Siddhanathan Shanmugam<BR>        

    Sent: Sunday, March 26, 2017 11:41 AM<BR>        To: 

    Gregory Popovitch<BR>        Cc: Haskell 

    Libraries<BR>        Subject: Re: Proposal: a new 

    implementation for Data.List.sort and<BR>        

    Data.List.sortBy, which has better performance characteristics and<BR>is 

    more<BR>        laziness-friendly.<BR><BR><BR><BR>  

          Thank you! This identifies a space leak in base which 

    went unnoticed<BR>for 7<BR>        years.<BR><BR>  

          Your implementation can be improved further. Instead of 

    splitting<BR>into<BR>        pairs, you could instead 

    split into lists of sorted sublists by<BR>replacing<BR>      

      the pairs function with the following<BR><BR>      

          pair = foldr f []<BR>          

        where<BR>              

      f x [] = [[x]]<BR>              

      f x (y:ys)<BR>                

      | x `cmp` head y == LT = (x:y):ys<BR>        

              | otherwise        

        = [x]:y:ys<BR><BR>        This should give 

    you the same performance improvements for sorting<BR>random<BR>    

        lists, but better performance while sorting ascending 

    lists.<BR><BR>        The version in base takes it one 

    step further by using a DList to<BR>handle the<BR>      

      descending case efficiently as well, except there's a space 

    leak<BR>right now<BR>        because of which it is 

    slower.<BR><BR>        On Sun, Mar 26, 2017 at 7:21 AM, 

    Gregory Popovitch<BR><<A href="mailto:greg7mdp@gmail.com" 

    target=_blank>greg7mdp@gmail.com</A>><BR>        

    wrote:<BR><BR><BR><BR>              

      Motivation:<BR>              

      ----------<BR><BR>              

      Data.List.sort is a very important functionality in 

    Haskell.<BR>I<BR>        believe that<BR>    

                the proposed implementation 

    is:<BR><BR>                - 

    significantly faster than the current implementation 

    on<BR>unsorted<BR>        lists,<BR>      

              typically 14% to 27% faster<BR>  

                  - more laziness-friendly, 

    i.e.:<BR>                  

      take 3 $ sort l<BR>              

        will require significantly less comparisons than 

    the<BR>current<BR>                

    implementation<BR><BR>              

      Proposed Implementation<BR>            

        -----------------------<BR><BR>        

            sort :: (Ord a) => [a] -> [a]<BR>  

                  sort =  sortBy 

    compare<BR><BR>                

    sortBy cmp [] = []<BR>              

      sortBy cmp xs = head $ until (null.tail) reduce (pair xs)<BR>  

                    where<BR>  

                      pair (x:y:t) 

    | x `cmp` y == GT  = [y, x] : pair t<BR>        

                          

       | otherwise        = [x, y] : pair 

    t<BR>                    

    pair [x] = [[x]]<BR>                

        pair []  = []<BR><BR>          

              reduce (v:w:x:y:t) = merge v' x' : reduce 

    t<BR>                    

                        

     where v' = merge v w<BR>            

                          

                 x' = merge x y<BR><BR>  

                      reduce 

    (x:y:t) = merge x y : reduce t<BR>            

            reduce xs      = xs<BR><BR>  

                      merge xs 

    []           = xs<BR>      

                  merge []  ys  

            = ys<BR>          

              merge xs@(x:xs') ys@(y:ys')<BR>  

                          

     | x `cmp` y == GT  = y : merge xs  ys'<BR>    

                        

     | otherwise        = x : merge xs' 

    ys<BR><BR><BR>                Effect 

    and Interactions<BR>                

    -----------------------<BR><BR>            

        I have a stack project with a criterion test for this 

    new<BR>        implementation,<BR>      

              available at <A 

    href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer 

    target=_blank>https://github.com/greg7mdp/gh<WBR>c-sort</A><BR><<A 

    href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer 

    target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A>><BR><BR>  

          <<A href="https://github.com/greg7mdp/ghc-sort" 

    rel=noreferrer 

    target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A><BR><<A 

    href="https://github.com/greg7mdp/ghc-sort" rel=noreferrer 

    target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort</A>> > 

    .<BR>                I ran the tests 

    on an Ubuntu 14.0.2 VM and GHC 8.0.2, and<BR>had the<BR>      

      following<BR>                

    results:<BR><BR>                - 

    sorting of random lists of integers is 27% faster<BR>      

              - sorting of random lists of strings is 

    14% faster<BR>                - 

    sorting of already sorted lists is significantly slower,<BR>but 

    still<BR>        much<BR>        

            faster than sorting random lists<BR>  

                  - proposed version is more 

    laziness friendly. For example<BR>this<BR>        

    version of<BR>                sortBy 

    requires 11 comparisons to find<BR>            

          the smallest element of a 15 element list, while 

    the<BR>default<BR>                

    Data.List.sortBy requires 15 comparisons.<BR>        

              (see<BR><BR><BR><A 

    href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs" 

    rel=noreferrer 

    target=_blank>https://github.com/greg7mdp/gh<WBR>c-sort/blob/master/src/sort_wi<WBR>th_trace.hs</A><BR><<A 

    href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs" 

    rel=noreferrer 

    target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort/blob/master/src/sort_w<WBR>ith_trace.hs</A>><BR><BR><<A 

    href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs" 

    rel=noreferrer 

    target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort/blob/master/src/sort_w<WBR>ith_trace.hs</A><BR><<A 

    href="https://github.com/greg7mdp/ghc-sort/blob/master/src/sort_with_trace.hs" 

    rel=noreferrer 

    target=_blank>https://github.com/greg7mdp/g<WBR>hc-sort/blob/master/src/sort_w<WBR>ith_trace.hs</A>> 

    ><BR>)<BR><BR><BR><BR>              

      Test results<BR>              

      ------------<BR><BR>              

      Criterion output (descending/ascending results are 

    for<BR>already<BR>        sorted<BR>      

              lists).<BR>        

            I barely understand what Criterion does, and I 

    am puzzled<BR>with the<BR>        various<BR>  

                  "T" output - maybe there is 

    a bug in my bench code:<BR><BR>            

        vagrant@vagrant-ubuntu-trusty-<WBR>64:/vagrant$ stack 

    exec<BR>ghc-sort<BR>                

    benchmarking ascending ints/ghc<BR>            

        TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT<WBR>TTTTTTTtime<BR>160.6 

    ms<BR>        (153.4<BR>        

            ms .. 167.8 ms)<BR>        

                          

           0.997 R²   (0.986 R² .. 1.000 

    R²)<BR>                mean  

                   161.7 ms  

     (158.3 ms .. 165.9 ms)<BR>            

        std dev              5.210 

    ms   (3.193 ms .. 7.006 ms)<BR>          

          variance introduced by outliers: 12% (moderately 

    inflated)<BR><BR>                

    benchmarking ascending ints/greg<BR>          

          TTTTTTTTTTTTTTTTtime          

           473.8 ms   (398.6 ms 

    ..<BR>554.9<BR>        ms)<BR>      

                          

             0.996 R²   (0.987 R² .. 1.000 

    R²)<BR>                mean  

                   466.2 ms  

     (449.0 ms .. 475.0 ms)<BR>            

        std dev              14.94 

    ms   (0.0 s .. 15.29 ms)<BR>          

          variance introduced by outliers: 19% (moderately 

    inflated)<BR><BR>                

    benchmarking descending ints/ghc<BR>          

          TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT<WBR>TTTTTTTtime<BR>165.1 

    ms<BR>        (148.2<BR>        

            ms .. 178.2 ms)<BR>        

                          

           0.991 R²   (0.957 R² .. 1.000 

    R²)<BR>                mean  

                   158.7 ms  

     (154.0 ms .. 164.3 ms)<BR>            

        std dev              7.075 

    ms   (4.152 ms .. 9.903 ms)<BR>          

          variance introduced by outliers: 12% (moderately 

    inflated)<BR><BR>                

    benchmarking descending ints/greg<BR>          

          TTTTTTTTTTTTTTTTtime          

           471.7 ms   (419.8 ms 

    ..<BR>508.3<BR>        ms)<BR>      

                          

             0.999 R²   (0.995 R² .. 1.000 

    R²)<BR>                mean  

                   476.0 ms  

     (467.5 ms .. 480.0 ms)<BR>            

        std dev              7.447 

    ms   (67.99 as .. 7.865 ms)<BR>          

          variance introduced by outliers: 19% (moderately 

    inflated)<BR><BR>                

    benchmarking random ints/ghc<BR>            

        TTTTTTTTTTTTTTTTtime            

         2.852 s    (2.564 s ..<BR>3.019 s)<BR>  

                          

                 0.999 R²   (0.997 

    R² .. 1.000 R²)<BR>                

    mean                 2.812 

    s    (2.785 s .. 2.838 s)<BR>          

          std dev              

    44.06 ms   (543.9 as .. 44.97 ms)<BR>        

            variance introduced by outliers: 19% (moderately 

    inflated)<BR><BR>                

    benchmarking random ints/greg<BR>            

        TTTTTTTTTTTTTTTTtime            

         2.032 s    (1.993 s ..<BR>2.076 s)<BR>  

                          

                 1.000 R²   (1.000 

    R² .. 1.000 R²)<BR>                

    mean                 2.028 

    s    (2.019 s .. 2.033 s)<BR>          

          std dev              

    7.832 ms   (0.0 s .. 8.178 ms)<BR>        

            variance introduced by outliers: 19% (moderately 

    inflated)<BR><BR>                

    benchmarking shakespeare/ghc<BR>            

        TTTTTTTTTTTTTTTTtime            

         6.504 s    (6.391 s ..<BR>6.694 s)<BR>  

                          

                 1.000 R²   (1.000 

    R² .. 1.000 R²)<BR>                

    mean                 6.499 

    s    (6.468 s .. 6.518 s)<BR>          

          std dev              

    28.85 ms   (0.0 s .. 32.62 ms)<BR>        

            variance introduced by outliers: 19% (moderately 

    inflated)<BR><BR>                

    benchmarking shakespeare/greg<BR>            

        TTTTTTTTTTTTTTTTtime            

         5.560 s    (5.307 s ..<BR>5.763 s)<BR>  

                          

                 1.000 R²   (0.999 

    R² .. 1.000 R²)<BR>                

    mean                 5.582 

    s    (5.537 s .. 5.607 s)<BR>          

          std dev              

    39.30 ms   (0.0 s .. 43.49 ms)<BR>        

            variance introduced by outliers: 19% (moderately 

    inflated)<BR><BR><BR>                

    Costs and Drawbacks<BR>              

      -------------------<BR><BR>            

        The only cost I see is the reduced performance when 

    sorting<BR>already<BR>        sorted<BR>    

                lists. However, since this remains 

    quite efficient, indeed<BR>over 4<BR>        

    times<BR>                faster than 

    sorting unsorted lists, I think it is an<BR>acceptable<BR>    

        tradeoff.<BR><BR>            

        Final note<BR>              

      ----------<BR><BR>              

      My Haskell is very rusty. I worked on this a couple years<BR>ago when 

    I<BR>        was<BR>          

          learning Haskell, and meant to propose it to the 

    Haskell<BR>community,<BR>        but<BR>    

                never got to it at the 

    time.<BR><BR>                

    ______________________________<WBR>_________________<BR>      

              Libraries mailing list<BR>    

                <A 

    href="mailto:Libraries@haskell.org" 

    target=_blank>Libraries@haskell.org</A><BR>        

            <A 

    href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries" 

    rel=noreferrer 

    target=_blank>http://mail.haskell.org/cgi-bi<WBR>n/mailman/listinfo/libraries</A><BR><<A 

    href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries" 

    rel=noreferrer 

    target=_blank>http://mail.haskell.org/cgi-b<WBR>in/mailman/listinfo/libraries</A>><BR><BR></DIV></DIV>  

          <<A 

    href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries" 

    rel=noreferrer 

    target=_blank>http://mail.haskell.org/cgi-b<WBR>in/mailman/listinfo/libraries</A><BR><<A 

    href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries" 

    rel=noreferrer 

    target=_blank>http://mail.haskell.org/cgi-b<WBR>in/mailman/listinfo/libraries</A>> 

    ><BR><BR><BR><BR><BR><BR><BR><BR><BR></BLOCKQUOTE></DIV><BR></DIV></DIV></DIV></BLOCKQUOTE></DIV><BR></DIV></BODY></HTML>