[Haskell-cafe] Re: excercise - a completely lazy sorting algorithm

Sat Jul 11 07:20:07 EDT 2009

Matthias Görgens wrote:
> Thanks.  I heard about the hylo-, ana- and catamorphisms before, but
> never explicitly used them.  Time to get started.

You did use them explicitly :) , namely in

  treeSort = bootstrap partitionOnMedian
  bootstrap f = Fix . helper . f
      where helper = fmap (Fix . helper . f)

  partitionOnMedian :: (Ord a) => (S.Seq a) -> BTreeRaw a (S.Seq a)

since

  bootstrap f = ana f

In particular, we have this slight reformulation

  bootstrap f = helper'
     where
     helper  = fmap helper'
     helper' = Fix . helper . f

and hence

  bootstrap f = helper'
     where helper' = Fix . fmap helper' . f

and thus

  bootstrap f = Fix . fmap (bootstrap f) . f

which is the definition of  ana f .

> And yet another question: One can get the median in deterministic
> linear time.  For quicksort choosing the median as pivot keeps the O(n
> log n) average running time and brings down the expected worst case,
> too.  Do you know of a (natural) way to combine selecting the median
> and doing the quicksort, so that you don't compare unnecessarily?
>
> The way to de-randomize quickselect is to calculate medians of
> medians.  I.e. solve the problem for smaller instances first.  I
> suspect if we follow this strategy with the reified quicksort
> call-trees, the de-randomized quicksort will look a lot like
> mergesort.

Interesting question! (Of which I don't know the answer of. ;) )

I never understood the median of median method because I always thought
that it calculates an approximate median of approximate medians, which I
now realize is not the case. I concur that there should be something
better than "recursively wasting" comparisons in quickselect just for
finding the median pivot in quicksort, especially since both are the
same algorithm modulo lazy evaluation.

Concerning quicksort looking like mergesort, it seems like a good idea
to somehow merge the groups of 5 from the median-of-medians calculation.
However, there is an important difference between quicksort and
mergesort, namely lazy quicksort is able to produce the batch of the
first k smallest elements in

  O(n + k log k) total time [1]

while lazy mergesort needs

  O(n + k log n) total time [2]

There is something about quicksort that makes it "better" than mergesort.

[1] Proof goes like this: First, quicksort chooses a sublist of smallest
items recursively until a list of the smallest k items remains, this
takes O(n) time. Then the list consisting of the smallest k items is
sorted in Θ(k log k).

[2] Mergesort builds a tournament tree in Θ(n) time and then performs k
tournaments that take O(log n) time each, so the the second phase is O(k
log n).

I need to think about this.

Regards,
apfelmus

--
http://apfelmus.nfshost.com