[Haskell-cafe] Incremental sequence sorting

David Feuer david.feuer at gmail.com
Wed Mar 18 21:21:03 UTC 2015


I had an idea, and it's kind of complicated, so I was wondering if I
could get a sanity check from someone before putting *too* much time
into it. Specifically, I was thinking about a sort of incremental
sorting of sequences (in Data.Sequence). The idea is very closely
related to the incremental sorting described by Navarro and Paredes (
http://www.dcc.uchile.cl/~gnavarro/ps/algor09.pdf ).

The basic concept: sequences are represented as Hinze-Paterson 2-3
finger trees. They are fairly lazy, so building them from the top down
tends to be a good thing. Suppose I have a sequence to be sorted. To
start building the result sequence, I need to know which elements go
in the first digit and which go in the last digit. To do this, I need
two order statistics, starting with one separating the first digit
from the rest, then one separating the last digit from the rest. Once
I have separated these, I can produce the top of the sequence. The
digit separation is the same all down the spine. Within digits, 2-3
trees split similarly.

I can get the order statistics using median-of-medians. As Navarro and
Paredes (and presumably others) describe, median-of-medians partially
sorts the sequence in the process of finding the desired order
statistic. Navarro and Paredes use a stack of indices into an array to
take advantage of this, but I need things rather more incremental.
What I was thinking is that I could use a finger tree of sequences (a
sequence of sequences making it easy to find where a particular total
length is reached) to separate the different segments from each other.
Each time I need to split on a particular order statistic, I can
isolate the necessary segment, use median-of-medians (splitting it
into pieces), and then append the pieces onto either side as
appropriate.

Does anyone have a sense of

1. Whether this is sane and/or
2. What the chances are that its performance will be competitive with
either of the current Data.Sequence sorting functions and/or
3. How I should actually implement median-of-medians for this
particular application (as there seem to be very few Haskell
implementations of the algorithm, and I've found no optimized ones)?

Thanks,
David Feuer


More information about the Haskell-Cafe mailing list