[Haskell-cafe] excercise - a completely lazy sorting algorithm

Mon Jul 6 16:42:02 EDT 2009

On Mon, Jul 6, 2009 at 12:26 PM, Petr Pudlak<deb at pudlak.name> wrote:
> More precisely: The result of sorting an n-element list or array should be a
> structure that allows to ask i-th element of the result (for example, a lazy
> array). Asking k arbitrary elements should take no more than
>  O(min(n * log n, k * n))
>

I'd argue this is not really a true use of "laziness" and more just
amortized cost.  That said, yes, it can be done!

I'm going to use an imperative data structure because I hate you
all</sarcasm> and it makes the analysis/presentation slightly easier.

To wit, we have a sorted array of unsorted bags of inputs.  For
example, (suppose we start with an array of numbers 1..10, scrambled),
we might start with:

(10,{1,5,6,8,3,4,2,7,9,10})

And after some operations, it might look like:

(3,{1,3,2});(4,{6,7,5,4});(3:{8,9,10})

And we're trying to (eventually) hit the state:

(1,{1});(2,{2});...etc.

(It's not hard to see how to in constant time maintain an array of
pointers into these bags to allow finding elements.)

Now, using a linear-time selection algorithm like median-of-medians or
Chazelle's, it's easy to see how to find the k-th largest element in
such a data structure: find the bag containing the k-th element, apply
the selection algorithm.  Furthermore, still in linear time, we can
(while we're working) split the bag around that element we found.  So,
given the data state:

(10,{1,5,6,8,3,4,2,7,9,10})

if we're asked for the 4th largest element, we'll update to:

(3,{1,3,2});(1,{4});(6,{5,6,8,7,9,10})

So far so good, right?  Now we just apply one more change: after each
lookup, we walk across the list of bags, and split each in half (as if
we were asked for the median element of each bag.)  In my running
example, we'd go to:

(1,{1});{1,{2});(1,{3});(1,{4});(2,{5,6});(1,{7});(3,{8,9,10})

This is still linear time--we run a linear-time algorithm on disjoint
subsets of the input.  Now, note: after k lookups to this structure,
the largest bag is at most n/2^k elements long!  Couple with a slight
optimization that overwrites the lookup table into the bags with just
the correct results once the bags are small enough, and this matches
your time bounds quite nicely.

Now, I doubt it'll be fast, but that's not what you asked.  Plus, come
up with a persuasive use case for a system where you /really need/ the
4th, 27th, and 957th largest elements /now/ and none of the rest, and
I promise I'll make something practically efficient. :)

If someone can translate my algorithm into a non-side-effecting one,
I'd appreciate it, but I think that like disjoint set/union, this is
probably inherently side-effecting.  Yes, yes, we could use functional
arrays, but short of that I don't see a way to avoid side effects to
take care of my amortized work.

AHH