Efficient way to code "(symbol,multiplicity)-counting" - how?

Mark Phillips mark@austrics.com.au
Mon, 27 May 2002 12:44:36 +0930


Hi Alastair,

Thanks for your email!  Sorry about the slowness of my reply, but it's
taken me quite some time to work through your email because many of
the syntax and concepts are new to me.  Yours has been a most
informative email.

I think I now understand most of your email, but it has raised in my
mind a number of questions which I will now ask.


 > An alternative representation would be [a] where counts are
 > represented by repetition.  Not clear which is better.

Yes, I had wondered.  In any case, it's not too hard to convert
between the two forms: (rep stands for repetitions)

   multToRep :: [(a,Int)] -> [a]
   multToRep [] = []
   multToRep ((aa,1):as) = aa : multToRep as
   multToRep ((aa,ab):as) = aa : multToRep ((aa,ab-1):as)

   repToMult :: Eq a => [a] -> [(a,Int)]
   repToMult [] = []
   repToMult (a:as) = (a,b+1) : repToMult cs where (b,cs) = peel a as

   peel :: Eq a => a -> [a] -> (Int,[a])
   peel a [] = (0,[])
   peel a (b:bs) = if (a==b) then (c+1,ds) else (0,b:bs)
       where (c,ds) = peel a bs

Anyway, probably the [(a,Int)] is the best.  It is the shortest
representation (except where multiplicities are mostly 1), and the
multToRep function is simpler (I think) than repToMult.


 > > type Multiplicity a = [(a,Int)]

Can we say "Multiplicity a" *is* "[(a,Int)]", or do we say
"Multiplicity a" *is_a_distinct_yet_identical_copy_of* "[(a,Int)]"?


 > > counts :: Eq a => [a] -> Multiplicity a
 > > counts as = [ (a, length (filter (==a) as)) | a <- List.nub as ]

Does it make a difference whether you write "filter (==a) as" or
"filter (a==) as"?

What do you think of the following as an alternative definition of
counts?

   counts [] = []
   counts (a:as) = (a,b+1) : counts cs where (b,cs)=strip a as

   strip :: Eq a => a -> [a] -> (Int,[a])
   strip a [] = (0,[])
   strip a (b:bs) = if (a==b) then (c+1,ds) else (c,b:ds)
       where (c,ds) = strip a bs

I am trying to work out how to code fast and memory efficient
haskell.  Is the above a good approach?


 > Symbols are elements of a methematical structure that have a first
 > element, a final element and an increment function.
 >
 > [I'm making this a 1st class structure because I want to be able to
 > share the symbols structure between multiple invocations of next.  I
 > will use this structure a lot like the way I would a typeclass -
 > except that I will explicitly create my own instance.]

I'm a little unsure about what you are saying here.  Am I right in
thinking a 1st class structure is one that may be thought of as data?
What is the alternative here?  Are you saying that by defining such a
structure, you can calculate the concepts once, and then pass them
around, rather than calculating them at each step of the process?

 > data Symbols a = Symbols{
 > first :: a,
 > final :: a,
 > inc :: a -> a
 > }
 >
 > The nxt function is a bit inefficient.  We're hampered here by
 > polymorphism: if all you can do is an equality test, you can't do
 > better than a linear time lookup.  A binary tree could be used instead
 > of the zip if we had an Ord instance; an array if we had an Ix
 > instance.

But we can assume that the "digits" are ordered, this ordering given
by the order in which they occur in the multiplicity.  Is there a way
of using this to make the digits an Ord instance?  And if so, how do you
do the binary tree?


 > > testF f = putStr $ showFigures (take 110 f)

What does the "$" do in the above?


 > We've also ignored the importance of the multiplicity constraint.
 > We can enforce this by discarding any result of incF2 which fails
 > the constraint.
 >
 > > incF3 :: Eq a => Multiplicity a -> Symbols a -> Figure a -> Figure a
 > > incF3 m s f = head (filter (countok m) (tail (iterate (incF2 m s) f)))

The filter combined with a check that the multiplicity constraint is
satisfied will work, but how efficient is it?  I am guessing that it
will depend how many are rejected.  If most are rejected then it's
probably inefficient, but if only a few are, then it's the best
way.  Are there any other pros and cons with this approach?

I am thinking that maybe a more efficient algorithm, in the case where
lots are expected to be rejected in the above, would be one involving
a dynamically changing multiplicity.  Ie, when a symbol is chosen, the
multiplicity is modified to reduce the corresponding multiplicity by
1.  Of course, maybe I'm just thinking too much in the imperative
framework still --- where the multiplicity would be represented as an
array of values that could be reassigned.  The problem seems to be
that lazy lists are not good when you want to do "random access
updates", which is roughly what we want to do with a multiplicity
list.  Are there well known Haskell solutions to this kind of issue?

By the way, the reason I wanted my counting to wrap back to "[]" after
getting to the maximum figure, is so the function "next" would always
work.  But your email suggested to me an alternative solution.  I could
just use the "Maybe" data type!  Ie, trying to do a next on the maximum
figure just gives you the Maybe "Nothing".

Thanks again for your very informative email!

Cheers,

Mark.

-- 
Dr Mark H Phillips
Research Analyst (Mathematician)

AUSTRICS - Smarter Scheduling Solutions - www.austrics.com

Level 2, 50 Pirie Street, Adelaide SA 5000, Australia
Phone +61 8 8226 9850
Fax   +61 8 8231 4821
Email mark@austrics.com.au