<div dir="ltr">The current implementation was really just an attempt to improve over the old implementation which did the naive construction. It did improve things for the sorted list case to the point where there is very little marginal benefit to using fromDistinctAscList any more, but it very much isn't the final word on the topic. =)<div><br>Finding something that handles the middle cases where you have "mostly sorted" data better would be a good idea, but definitely needs to be benchmarked to death, and it would help to know if the merges of runs affect the overall asymptotic performance.<br><br>Timsort is both fast and popular and includes a similar pass (it also happens to include handling strictly descending runs as mentioned), so it seems that there is room to do something similar here.<br><br></div><div>-Edward</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Feb 9, 2017 at 3:20 PM, David Feuer <span dir="ltr"><<a href="mailto:david.feuer@gmail.com" target="_blank">david.feuer@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Currently, the implementations of fromList for Data.Set and Data.Map<br>
try to take advantage of inputs with some prefix in strictly<br>
increasing order. It uses the fast (O (n)) fromDistinctAscList<br>
algorithm for the strictly ascending prefix, and then falls back on<br>
the slower (O (n log n)) naive algorithm for the rest. For Data.Set,<br>
this works roughly like the implementation sketched at the bottom of<br>
this message.<br>
<br>
This whole thing strikes me as a bit odd: changing just the first<br>
element of the input list can have a substantial impact on the overall<br>
performance.<br>
<br>
In my opinion, there are two reasonable ways to implement this function:<br>
<br>
1. Don't bother to try to take advantage of preexisting ordering. This<br>
is the simplest option, and will probably be fastest for lists with<br>
little preexisting order.<br>
<br>
2. Try to take advantage of sorted ascending and descending "runs". A<br>
simple version would be to break up the input list into ascending and<br>
descending segments (merging duplicates on the fly), use<br>
fromDistinctAscList or fromDistinctDescList to convert each segment,<br>
and combine the segments using set unions. This option will be slower<br>
for totally unsorted lists, but should degrade much more gradually<br>
than the current algorithm as disorder is added. Wren suggests that<br>
there may be other variations on the same theme that we could<br>
consider.<br>
<br>
Should we<br>
<br>
A. Switch to 1 (and offer 2 by another name)<br>
B. Switch to 2 (and offer 1 by another name?)<br>
C. Leave fromList alone and offer 2 by another name<br>
<br>
Note that option 1 can be implemented quite easily outside the library<br>
using insert and foldl'; option 2 is not something users should be<br>
expected to write themselves.<br>
<br>
If we add names, what should those be?<br>
<br>
Thanks,<br>
David Feuer<br>
<br>
Current implementation sketch:<br>
<br>
fromList :: Ord a => [a] -> Set a<br>
fromList xs = foldl' (flip insert) sortedSet unsortedList<br>
where<br>
sortedSet = fromDistinctAscList sortedList<br>
(sortedList, unsortedList) = spanStrictlyAscending xs<br>
<br>
spanStrictlyAscending [] = ([], [])<br>
spanStrictlyAscending (x : xs) = x : go x xs<br>
where<br>
go prev (n : ns)<br>
| prev < n = first (n :) $ go n ns<br>
go _ ns = ([], ns)<br>
______________________________<wbr>_________________<br>
Libraries mailing list<br>
<a href="mailto:Libraries@haskell.org">Libraries@haskell.org</a><br>
<a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries" rel="noreferrer" target="_blank">http://mail.haskell.org/cgi-<wbr>bin/mailman/listinfo/libraries</a><br>
</blockquote></div><br></div>