[Haskell-cafe] excercise - a completely lazy sorting algorithm

Mon Jul 6 20:49:08 EDT 2009

The "sorted array of bags of unsorted input" is a nice idea.  However,
you have to use the data structure in a single-threaded [1] fashion to
obtain the claimed bounds.

Here's a pure solution that uses amortization and laziness.

> import qualified Data.Sequence as S
> import Data.Sequence ((><))
> import Data.Foldable
> import Data.Monoid

Suppose we have a function to find the the median of a list, and
partition it into three sublists: Smaller than the median, equal to
the media, larger than the median.  That function should run in linear
time.

> partitionOnMedian :: forall a. (Ord a) => (S.Seq a) -> BTreeRaw a (S.Seq a)

where the following data structure holds the sublists and some
bookkeeping information:

> data BTreeRaw a m = Leaf
>                   | Node {cmp::(a->Ordering)
>                          , lN :: Int
>                          , less::m
>                          , eq :: (S.Seq a)
>                          , gN :: Int
>                          , greater::m
>                          }

where 'lN' and 'gN' are the length of 'less' and 'greater'.

We can make BTreeRaw a functor:

> instance Functor (BTreeRaw a) where
>     fmap f Leaf = Leaf
>     fmap f (Node c lN l e gN g) = Node c lN (f l) e gN (f g)

Now using a fixed-point construction we can bootstrap a sorting
algorithm from partitionOnMedian:

> data Fix m = Fix {unfix :: (m (Fix m))}
> type BTree a = Fix (BTreeRaw a)

> treeSort :: forall a. (Ord a) => S.Seq a -> BTree a
> treeSort = Fix . helper . partitionOnMedian
>     where helper = fmap (Fix . helper . partitionOnMedian)

Now treeSort produces the thunk of a balanced binary search tree.  Of
course we can get a sorted list out of it (forcing the whole
structure):

> flatten :: BTree a -> S.Seq a
> flatten (Fix Leaf) = S.empty
> flatten (Fix (Node _ lN l e gN g)) = flatten l >< e >< flatten g

> mySort = flatten . treeSort

But we can also get elements efficently, forcing only a linear amount
of comparisions in the worst case:

> index :: BTree a -> Int -> a
> index (Fix Leaf) _ = error "tried to get an element of Leaf"
> index (Fix (Node lN l e gN g)) i | i < lN
>                                      = index l i
>                                  | i - lN < S.length e
>                                      = S.index e (i-lN)
>                                  | i - lN - S.length e < gN
>                                      = index g (i - lN - S.length e)
>                                  | i - lN - S.length e - gN >= 0
>                                      = error "index out of bounds"

Although we do have to force comparisions only once every time we
touch the same element in the tree, we do still have to traverse the
tree (in logarithmic time).

If you want linear time access on first touch of an element and
constant time access afterwards us toArray:

> toArray :: (IA.IArray a t) => Fix (BTreeRaw t) -> a Int t
> toArray tree = IA.listArray (0,maxI) (map (index tree) [0..maxI])
>     where size (Fix Leaf) = 0
>           size (Fix (Node lN _ e gN _)) = lN + S.length e + gN
>           maxI = size tree - 1

[1] Single-Threaded in the sense of Okasaki's use of the word.