Data.List.sortBy creates unnecessary thunks

Siddhanathan Shanmugam siddhanathan+eml at gmail.com
Sun Mar 26 07:10:54 UTC 2017


Data.List.sortBy uses a DList-like representation to efficiently chunk a
list, but the final conversion from DList to List is not strict. This
leaves unnecessary suspensions while sorting. This was first noticed in
https://github.com/ghc-proposals/ghc-proposals/pull/50, where sorting a
random list with a naive merge sort is faster than the one in base.

Proposing the following change, which should improve performance and space
usage.

```diff
diff --git a/libraries/base/Data/OldList.hs b/libraries/base/Data/OldList.hs
index c3618c42a9..a68c4f42af 100644
--- a/libraries/base/Data/OldList.hs
+++ b/libraries/base/Data/OldList.hs
@@ -851,7 +851,7 @@ sortBy cmp = mergeAll . sequences

     ascending a as (b:bs)
       | a `cmp` b /= GT = ascending b (\ys -> as (a:ys)) bs
-    ascending a as bs   = as [a]: sequences bs
+    ascending a as bs   = *as [a] `seq`* as [a] : sequences bs

     mergeAll [x] = x
     mergeAll xs  = mergeAll (mergePairs xs)
```


And, as a side note, can we please add more type signatures and comments to
the sort/sortBy function?

Cheers,
Sid
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20170326/eeddc5d3/attachment.html>


More information about the Libraries mailing list