[Haskell-cafe] Re: Collections

Wed Jun 20 03:47:22 EDT 2007

Thomas Conway wrote:
> In particular, I find my self wanting to use a priority queue for
> N-way sorted merge, which you can do with Data.Map: (compiles, so
> clearly works even though I have not tested it. ;-) )
> 
> import Data.List as List
> import Data.Map as Map
> 
> merge :: Ord t => [[t]] -> [t]
> merge lists = merge' $ Map.fromList $ concatMap makePair lists
>    where
>    makePair [] = []
>    makePair (x:xs) = [(x,xs)]
> 
> merge' heap
>    | Map.null heap = []
>    | otherwise     = x:(merge' $ removeEqual x $ reinsert xs heap')
>    where
>    ((x,xs), heap') = deleteFindMin heap
> 
> reinsert [] heap = heap
> reinsert (x:xs) heap = Map.insert x xs heap
> 
> removeEqual x heap
>    | Map.null heap = heap
>    | x /= y        = heap
>    | otherwise     = removeEqual x $ reinsert ys heap'
>    where
>    ((y,ys), heap') = deleteFindMin heap

Eh, why not a simple mergesort that also deletes duplicates?

    -- the nested lists must be sorted: map sort xs == xs
  mergesort :: Ord a => [[a]] -> [a]
  mergesort []  = []
  mergesort xs  = foldtree1 merge xs

  foldtree1 :: (a -> a -> a) -> [a] -> a
  foldtree1 f [x] = x
  foldtree1 f xs  = foldtree1 f $ pairs xs
     where
     pairs []        = []
     pairs [x]       = [x]
     pairs (x:x':xs) = f x x' : pairs xs

  merge :: Ord a => [a] -> [a] -> [a]
  merge []     ys     = ys
  merge xs     []     = xs
  merge xs'@(x:xs) ys'@(y:ys)
      | x  < y    = x:merge xs  ys'
      | x == y    =   merge xs  ys'
      | otherwise = y:merge xs' ys

The function 'foldtree1' folds the elements of the list as if they where
in a binary tree:

  foldrtree1 f [1,2,3,4,5,6,7,8]
 ==>
  ((1 `f` 2) `f` (3 `f` 4)) `f` ((5 `f` 6) `f` (7 `f` 8))

and with f = merge, this serves as heap (although a very implicit one).
The hole mergesort will take

  O(n*log (length xs)) where n = length (concat xs)

time. Moreover, this variant of mergesort happens to generate elements
as soon as they are available, i.e.

 head . mergesort  is  O(n)

See also

 http://article.gmane.org/gmane.comp.lang.haskell.general/15010

> The other thing I have found myself doing often is using splitLookup
> followed by union, though what I really want is "join" being the dual
> of split - i.e. requiring all the keys in the rhs to be greater than
> the keys in the lhs. My own AVL tree implementation has this operation
> which is O(log n), which is rather better than union's O(n log n).

2-3-Finger trees support efficient splits and concatenations:

  http://www.soi.city.ac.uk/~ross/papers/FingerTree.html

In fact, you can build a plethora of data structures from them.

Regards,
apfelmus