[Haskell-beginners] sorting almost sorted list

Tue Sep 27 10:08:53 CEST 2011

Dennis Raddle wrote:
> I have a problem from music. I have a list of notes sorted by begin
> time. I want to sort them by end time.
>
> Notes that are sorted by begin time are usually close to sorted by end
> time, because notes tend to cluster within a small range of durations.
>
> What algorithms are available to me (hopefully in a library) that are
> good with this kind of thing?

Sorry, I don't know of any library that handles this special
case. I'm also not too familiar with the literature, so
I don't know of anyone who writes about it. (It's probably
worthwhile to have a look in Knuth.)

Anyway, it seems like this is not too hard.

Can you guarantee for some value of m that for each note N,
only the first m notes following N might end earlier than N?

If so, then the following simple algorithm is linear
and runs in constant space. You could then use:
sortAlmostBy m (comparing endTime)

sortAlmostBy :: Int -> (a -> a -> Ordering) -> [a] -> [a]
sortAlmostBy m cmp =
  mergeAABy cmp . map (sortBy cmp) . chunksOf m

chunksOf :: Int -> [a] -> [[a]]
chunksOf m =
  map (take m) . takeWhile (not . null) . iterate (drop m)

-- Merge a list of lists with two assumptions:
-- (1) Each list is sorted.
-- (2) The sequence of lists is "almost ascending" -
-- no element of a list is greater than any element of
-- any list following it, except possibly the list immediately
-- following it.
mergeAABy :: (a -> a -> Ordering) -> [[a]] -> [a]
mergeAABy cmp ((x:xs):xss) = y : mergeAABy cmp yss
  where
    (y, yss) = pickLeast x xs xss
    pickLeast p ps pss@((q:qs):qss) = case cmp p q of
                       GT -> (q, (p : ps) : qs : qss)
                       _  -> (p, ps : pss)
    pickLeast p ps (_:pss) = pickLeast p ps pss
    pickLeast p ps _       = (p, [ps])
mergeAABy cmp (_:xss) = mergeAABy cmp xss
mergeAABy _ _ = []

You might be able to do a little better than this.
Here is one way: GHC would probably optimize this
better if you make pickLeast non-recursive by
arranging for lists that become empty to be eliminated
from the calculation earlier.

But as David points out, this is probably good enough
for your application, even if you are processing a
full orchestra score of a Mahler symphony that lasts
for hours. (Whereas just using sortBy for that case might
be slow.)

Regards,
Yitz