Efficiency of list indices in definitions

Wed, 31 Jul 2002 14:55:48 -0600

Hi, I'm experimenting with various implementations of finite automata.
I think the following is interesting, but gives rise to a question,
which might actually be silly.  (Forgive me if it is.)

First, I can define a state as a recursive data type:

data FaState l = FaState {label :: l, acceptQ :: Bool, trans :: [[FaState l]]}

where l is the type of an arbitrary label.  (This will become
important a bit later on.)  The trans is the adjacency list for state
automaton: trans!!i is a list of all states reachable via the symbol i
from the current state.  (I use a list rather than an array because I
typically work with binary alphabets, so the use of Arrays doesn't
seem justified.)  I make trans!!i a list so that I can represent
nondeterminism.

Then a finite automaton is merely a list of starting FaStates:

type FinAu l = [FaState l]

So, for example, an automaton equivalent to the regular expression
0*(0(0+1)*)* could be defined as:

dom18 :: FinAu Int
dom18 = [one,two]
    where one = FaState 1 True [[two],[]]
	  two = FaState 2 True [[one],[one]]

So, with this machinery in place we can define most of the classical
constructions quite easily.  For example, to check whether a given
string is accepted by an automaton:

finAuAcceptStringQ :: FinAu l -> [Int] -> Bool
finAuAcceptStringQ startStates string =
    or [faStateAcceptStringQ s string | s <- startStates]
    where faStateAcceptStringQ (FaState _ acceptQ _) [] = acceptQ
	  faStateAcceptStringQ (FaState _ _ trans) (a:as)
	      | null followerStates = False
	      | otherwise =
		  or [ faStateAcceptStringQ s as | s <- followerStates ]
	      where followerStates = if a > (length trans) - 1 then []
				     else trans!!a

and the classical nfa-to-dfa power-set (determinization) construction
may be implemented quite nicely with:

finAuDeterminize :: (Ord l) => FinAu l -> FinAu [l]
finAuDeterminize startStates = [f startStates]
    where f :: (Ord l) => [FaState l] -> FaState [l]
          f states = FaState label' acceptQ' trans'
	      where label' = sort (map label states)
		    acceptQ' = or (map acceptQ states)
		    trans' = map (singleton.f.concat) (transpose (map trans states))
		    singleton a = [a]

Now, we get to the heart of my question.  This definition is quite
lovely, recursive and lazy, but it doesn't actually capture the
recurrent structure of the resulting automata.  That is, if we were to
compute:

finAuAcceptStringQ (finAuDeterminize dom18) somestring

then finAuDeterminize would keep spinning out, generating more and
more states, which would actually be equivalent (and, have identical
state labels).  So an idea is that we can assume the state labels
induce an equivalence relation on our states and define something like:

finAuConsolidate :: (Eq l) => FinAu l -> FinAu l
finAuConsolidate startStates =
    [ newStates!!i | l <- map label startStates, i <- elemIndices l (map fst assocs) ]
    where assocs = finAuStateLabelAssoc startStates
	  newStates = map (f.snd) assocs
	  f oldState@(FaState l acceptQ trans) =
	      FaState l acceptQ (map (map (g.label)) trans)
	  g label = case elemIndex label (map fst assocs)
		    of Just i -> newStates!!i
		       Nothing -> error "odd error in finAuConsolidate"

provided we have defined:

finAuStateLabelAssoc :: (Eq l) => FinAu l -> [(l,FaState l)]
finAuStateLabelAssoc startStates = foldl f [] startStates
    where f knownStatesAssoc state@(FaState label _ trans)
	      | isJust (lookup label knownStatesAssoc) = knownStatesAssoc
	      | otherwise = foldl f ((label,state):knownStatesAssoc) followers
	      where followers = concat trans

So this is the heart of my question: is this the best way to proceed?
Will the list lookups (!!) of finAuConsolidate be used only once in
the construction or will they be perpetually used in something like:

finAuAcceptStringQ ((finAuConsolidate.finAuDeterminize) dom18) somestring?

That is, how much of a difference would it make if I were to use an
Array instead?

Thanks,
  Carl