Efficient way to code "(symbol,multiplicity)-counting" - how?

23 May 2002 13:11:37 +0100

This looked like a fun problem.  Here's a solution and some comments
on how I went about solving it.

-- 
Alastair Reid        Reid Consulting (UK) Ltd

> import List( nub )
> import Maybe( fromMaybe, fromJust )

It seems like your enumerations have two constraints:

1) They have to obey the multiplicity constraint.
2) They have to obey the ordering constraint.

Let's tackle them one at a time:

[Metanote: a common way to write Haskell programs is sort of bottom
up: identify the core concepts, build up a library of useful code for
representing and manipulating those concepts and try to explore a
little of the algebraic properties of the concepts, then look to see
if this makes the problem easy yet.]

An alternative representation would be [a] where counts are
represented by repetition.  Not clear which is better.

> type Multiplicity a = [(a,Int)]

For the sake of testing, here's a sample multiplicity:

> m1 :: Multiplicity Char
> m1 = [('x',1),('y',2),('z',1)]

Extract a multiplicity from a list with duplicates

> counts :: Eq a => [a] -> Multiplicity a
> counts as = [ (a, length (filter (==a) as)) | a <- List.nub as ]

Extract count from a multiplicity 

Use 0 if not present rather than raising error because 
it saves having to litter callers with guard code.

> count :: Eq a => Multiplicity a -> a -> Int
> count m a = fromMaybe 0 (lookup a m)

Ordering on two multiplicities: when is one <= another?

> le :: Eq a => Multiplicity a -> Multiplicity a -> Bool
> m1 `le` m2 = and [ n <= count m2 a | (a,n) <- m1 ]

and, of course, equality:

> eq :: Eq a => Multiplicity a -> Multiplicity a -> Bool
> eq m1 m2 = m1 `le` m2 && m2 `le` m1

Finally, we can check that a list satisfies the multiplicity
constraint.

> countok :: Eq a => Multiplicity a -> [a] -> Bool
> countok m as = counts as `le` m

Now onto the ordering constraint.  If I replace your symbols with
digits and ignore the multiplicity constraint, the enumerations would
look something like this.

  0, 1, 2, 10, 11, 12, 20, 21, 22, ...

In other words, I can find the next element in a list just by
incrementing the list elements.  Let's consider that first (since it
is easier).

A number is a list of digits _in reverse order_

> type Digit = Int
> type Number = [Digit]

Debugging/checking is easier if the numbers look like numbers so let's
define some printing functions:

> showNumber :: Number -> String
> showNumber = concat . map show . reverse

> showNumbers :: [Number] -> String
> showNumbers = concat . map (++"\n") . map showNumber

Counting to infinity:

> incN :: Number -> Number
> incN (9:ds) =   0 : incN ds
> incN (d:ds) = d+1 : ds
> incN []     = incN [0] 

We can enumerate all numbers by iterating:

> numbers :: [Number]
> numbers = iterate incN [0]

Printing this on the screen, we can easily see that we got it right.

> testN = putStr $ showNumbers (take 110 numbers)

Now let's tackle the real problem.  

Lists of symbols are called figures.  As before, we use reversed lists

> type Figure a = [a]

> showFigure :: Show a => Figure a -> String
> showFigure = concat . map show . reverse

> showFigures :: Show a => [Figure a] -> String
> showFigures = concat . map (++"\n") . map showFigure

Symbols are elements of a methematical structure that have a first
element, a final element and an increment function.

[I'm making this a 1st class structure because I want to be able to
share the symbols structure between multiple invocations of next.  I
will use this structure a lot like the way I would a typeclass -
except that I will explicitly create my own instance.]

> data Symbols a = Symbols{ 
>   first :: a, 
>   final :: a, 
>   inc :: a -> a 
> }

We can turn a multiplicity into a Symbols structure quite easily

> mkSymbols :: Eq a => Multiplicity a -> Symbols a
> mkSymbols m = Symbols{ 
>   first = fst (head m),
>   final = fst (last m),
>   inc = nxt m 
>   }

The nxt function is a bit inefficient.  We're hampered here by
polymorphism: if all you can do is an equality test, you can't do
better than a linear time lookup.  A binary tree could be used instead
of the zip if we had an Ord instance; an array if we had an Ix
instance.

> nxt :: Eq a => Multiplicity a -> a -> a
> nxt m a = fromJust (lookup a (zip m' (tail m')))
>  where
>   m' = map fst m

And now we copy the incN function and tweak it to use the Symbols structure:

> incF :: Eq a => Symbols a -> Figure a -> Figure a
> incF s (d:ds) | d == final s
>               = first s : incF s ds
> incF s (d:ds) = (inc s) d : ds
> incF s []     = [first s]        -- slight difference here

We make one slight change in the process.
With numbers, we treat the white space at the left of a number
as an infinite sequence of 0's.  That's why we wrote:

    incN [] = incN [0] 

We don't do that here.

We can enumerate all figures by iterating:

> figures :: Eq a => Symbols a -> [Figure a]
> figures s = iterate (incF s) []

Printing this on the screen, we can easily see that we got it right.

> testF f = putStr $ showFigures (take 110 f)

Now let's pop up a level and see if we have enough bits to solve the
whole problem.

So far I've ignored the importance of going back to the start when you 
reach the maximum multiplicity.

For this we need the maximum figure of a given multiplicity:

> maxF :: Multiplicity a -> Figure a
> maxF [] = []
> maxF ((a,n):m) = replicate n a ++ maxF m

[This takes both a multiplicity and a symbols structure as argument 
because we want efficient access to both.]

> incF2 :: Eq a => Multiplicity a -> Symbols a -> Figure a -> Figure a
> incF2 m s f | f == maxF m = []
> incF2 m s f | otherwise   = incF s f

> testF2 m = testF (iterate (incF2 m (mkSymbols m)) [])

We've also ignored the importance of the multiplicity constraint.
We can enforce this by discarding any result of incF2 which fails
the constraint.

> incF3 :: Eq a => Multiplicity a -> Symbols a -> Figure a -> Figure a
> incF3 m s f = head (filter (countok m) (tail (iterate (incF2 m s) f)))

> testF3 m = testF (iterate (incF3 m (mkSymbols m)) [])

This works but it seems a bit inefficient to do a linear search for
the next valid successor.  I have an inkling of how to do that but
I'll leave it for someone else.

Mark Phillips <mark@austrics.com.au> writes:
| Hi, I am new to Haskell and am having some difficulty with the
| following problem.

| Suppose I have a list of "symbols", except that each symbol is
| paired with a "multiplicity".  Ie, we have a list of type [(a,Int)].

| I want to use these symbols to "count".  Let me explain what I mean
| by count with the following example.

| Suppose we have symbol list [(x,1),(y,2),(z,1)].  This tells us that
| * we have three symbols, namely x, y and z * the symbols are
| ordered, namely x < y < z * with any "counting figure" the symbol x
| may appear at most once, the symbol y at most twice and the symbol z
| at most once (a "counting figure" is just a list of symbols, these
| symbols forming the "digits") * any "counting figure" will have
| between 0 and 1+2+1=4 "digits" (symbols) * the "counting figures"
| are ordered; the fewer the number of "digits" the "smaller" the
| figure; for figures with the same number of digits ordering is based
| on the symbol ordering, the left-most digit being "most
| significant", second-left being "second-most significant" and so on.

| We "count" as follows: [] [x] [y] [z] [x,y] [x,z] [y,x] [y,y] [y,z]
| [z,x] [z,y] [x,y,y] [x,y,z] [x,z,y] [y,x,y] [y,x,z] [y,y,x] [y,y,z]
| [y,z,x] [y,z,y] [z,x,y] [z,y,x] [z,y,y] [x,y,y,z] [x,y,z,y]
| [x,z,y,y] [y,x,y,z] [y,x,z,y] [y,y,x,z] [y,y,z,x] [y,z,x,y]
| [y,z,y,x] [z,x,y,y] [z,y,x,y] [z,y,y,x] and then start back at the
| beginning again (with []).

| I want to define a function next :: [(a,Int)] -> [a] -> [a]

| which finds the next list in the the "counting sequence".  So for
| example we should get

| next [(x,1),(y,2),(z,1)] [] == [x] next [(x,1),(y,2),(z,1)] [z,y] ==
| [x,y,y] next [(x,1),(y,2),(z,1)] [x,y,y,z] == [x,y,z,y] next
| [(x,1),(y,2),(z,1)] [z,y,y,x] == [] etc

| My question is, what is the best way to code this in Haskell?  Can
| it be done efficiently?

| Also, is my representation of symbols and multiplicities the best
| method?  Would I be better to represent them as two lists say, and
| in reverse order say: ie [z,y,x] and [1,2,1].  Or is there another
| better way to frame the whole problem?