Efficiency of list indices in definitions

oleg@pobox.com oleg@pobox.com
Mon, 5 Aug 2002 19:12:12 -0700 (PDT)


Carl McTague wrote:

> finAuDeterminize :: (Ord l) => FinAu l -> FinAu [l]
> finAuDeterminize startStates = [f startStates]
>     where f :: (Ord l) => [FaState l] -> FaState [l]
>           f states = FaState label' acceptQ' trans'
>             where label' = sort (map label states)
>                   acceptQ' = or (map acceptQ states)
>                   trans' = map (singleton.f.concat) (transpose (map trans states))
>                    singleton a = [a]

> Now, we get to the heart of my question.  This definition is quite
> lovely, recursive and lazy, but it doesn't actually capture the
> recurrent structure of the resulting automata.  That is, if we were to
> compute:

> finAuAcceptStringQ (finAuDeterminize dom18) somestring

> then finAuDeterminize would keep spinning out, generating more and
> more states, which would actually be equivalent (and, have identical
> state labels).

To prevent such a behavior we ought to keep track of the states that
we have already computed.

Before getting to the solution, I'd like to take the liberty to
modify your definition for the FSM to improve readability (at the
expense of efficiency). I'd like to be able to print automata, to see
what exactly I have determinized.


data FaState l = FaState {label :: l, acceptQ :: Bool, trans :: [[l]]}
         deriving(Eq,Show)
--               states         start states
data FA l = FA [(l,FaState l)]      [l]           deriving Show

The transition function is now the list of labels rather than the list
of states. A naive NFA recognizing a sample 0*(0(0+1)*)* can be written
as

dom18 = FA [(1,one), (2,two)] [1]
    where one = FaState 1 True [[1,2],[]]
	  two = FaState 2 True [[2,1],[2,1]]

(BTW, I had trouble with the automaton in the original article. That
NFA recognized [1], which it should not). The acceptance function can
be written as

finAuAcceptStringQ :: (Eq l) => FA l -> [Int] -> Bool
finAuAcceptStringQ (FA all_states start_labels) str =
	or [acceptP (locate_state l) str | l <- start_labels]
  where locate_state l = let (Just state) = lookup l all_states in state
	acceptP (FaState _ acceptQ _) [] = acceptQ
	acceptP (FaState _ _ trans) (a:as) = 
		or [ acceptP (locate_state l) as | l <- followerStates ]
		where followerStates = 
			   if a > (length trans) - 1 then []
			   else trans!!a


The determinizer can then be written as follows:

finAuDeterminize (FA all_states start_labels) = FA all_states' [start_label']
  where
    -- [NFA_labels] -> DFA_labels
    det_labels = sort . nub
    start_label' = det_labels start_labels
   
    -- DFA states 
    all_states' = build_states [start_label'] []
    
    --    build_states labels seen_labels
    -- lazily builds the list of DFA states. 'labels' is the list
    -- of labels for the states to build. seen_labels are the labels
    -- of the states we have built already
    build_states [] seen_labels = []
    build_states (l:labels) seen_labels =
                 if l `elem` seen_labels then -- already built DFA state l
		    build_states labels seen_labels
		 else really_build l labels seen_labels
    
    -- Build a DFA state with a label 'dfa_label' and return a list
    -- of the constructed state and a promise to build the follower states
    really_build dfa_label labels seen_labels =
                 (dfa_label,dfa_state):
		 build_states new_labels (dfa_label:seen_labels)
       where 
	  dfa_state   = FaState dfa_label acceptQ' trans'
	  locate_state l = let (Just state) = lookup l all_states in state
	  nfa_states  = map locate_state dfa_label
	  acceptQ'    = or (map acceptQ nfa_states)
	  trans'      = map ((:[]).det_labels.concat) $ 
	                    transpose $ map trans nfa_states
	  new_labels  = (concat trans') ++ labels

test1' = finAuAcceptStringQ (finAuDeterminize dom18) [0,1,0,1]
test2' = finAuAcceptStringQ (finAuDeterminize dom18) [1,0]

The states of the output DFA are built lazily. Each state is constructed
only once. To see that equivalent states are not duplicated, we can
check the result of determinizing the dom18: finAuDeterminize dom18
prints

FA [([1],   FaState{label=[1],  acceptQ=True, trans=[[[1,2]],[[]]]}),
    ([1,2], FaState{label=[1,2],acceptQ=True, trans=[[[1,2]],[[1,2]]]}),
    ([],    FaState{label=[],   acceptQ=False,trans=[]})] 
  [[1]] -- the starting state

which is indeed a DFA (and so happens, the minimal one).