Best recursion choice for "penultimax"

Sun, 24 Nov 2002 22:06:42 -0800

Hi Mark,

| I have just implemented the function "penultimax" which takes a list
| of positive integers and produces the "penultimate maximum", that is,
| the next biggest integer in the list after the maximum.  Eg:
| 
| penultimax [15,7,3,11,5] = 11

To your three implementations, let me add another two.  If you are
looking
for the smallest possible definition, consider the following:

  import List

  penultimax1 :: Ord a => [a] -> a
  penultimax1  = head . tail . sortBy (flip compare)

In other words, to find the second largest, sort (in descending order,
which is why I use "flip compare") and then extract the second element.
(You could also use "(!!1)", but I think that "head . tail" is nicer.)
Thanks to lazy evaluation, using sort in this way isn't as expensive
as you might think; because we ask only for the first two elements,
only a small part of the full sort computation will be needed.

A little more algorithmic sophistication leads to the following
alternative that can find the penultimax with only  n + log2 n
comparisons (approx), where n is the length of the list.

  penultimax :: Ord a => [a] -> (a, a)
  penultimax  = tournament . map enter
   where enter x = (x, [])

         tournament [(x, xds)] = (x, maximum xds)
         tournament others     = tournament (round others)

         round ((x,xds):(y,yds):others)
                   | x>=y      = (x, y:xds) : rest
                   | otherwise = (y, x:yds) : rest
                     where rest = round others
         round xs              = xs

The inspiration for this code is a knock-out tournament, treating
the values in the input list as teams.  To "enter" the competition,
each team is paired with the (initially) empty list of teams that it
has defeated.  In each round, we play the teams against each other in
pairs (if there are an odd number of teams, the last one gets a "by"
to the next round).  In each game, the team with the highest value
wins, and adds the opponent to its list of victories.  The tournament
concludes when only one team remains.  And here comes the clever
part:  the penultimax must be the largest entry in the victors list
of defeats because it would have won all of its games until, at some
point, being knocked out of the competition by the eventual winner.
And hence we need only scan that list for its "maximum".

[I'm afraid I don't know who invented this---I learned about it while
teaching a class on algorithms---but the rendering above in Haskell
is mine, and could be buggy!]

Neat algorithm eh?  But be careful ...

| How do I work out which is best to use?  Is there
| one clear "winner", or will they each have pros and
| cons?

Some quick tests with Hugs +s on a example list that I constructed
with 576 elements give food for thought:

                      reductions     cells
   my one liner          4035        11483
   tournament            7053        12288
   your penultimax      16715        20180
   your penultimax2      7466        10344
   your penultimax3      8605        13782

With the caveat that this is just one example (although others
I tried gave similar results), the conclusion seems to be that
my one liner is probably the winner, beating all of the others
in reductions, all but one of the others in space, and with the
simplest definition of all.  The fact that it is coded entirely
using prelude functions might also be a benefit if you use a
compile that provides fancy implementations or optimizations
for such functions.

My advice is that you should always start with the simplest
definition (i.e., the one that is easiest to code, easiest to
understand, and most easily seen to be correct).  You should not
worry about rewriting it in what you hope may be a more efficient
form unless you find later, by profiling or other means, that
its performance really is a problem.  (In which case, you'll be
able to collect some real, representative data against which
you can test and evaluate the alternatives.)  For starters,
a supposedly "improved" version might not actually be more
efficient (constant factors do matter sometimes!).  Moreover,
in attempting to "optimize" the code, you might instead break it
and introduce some bugs that will eventually come back and bite.

Hope this helps (or at least, is entertaining :-)

All the best,
Mark