Efficiency of list indices in definitions
oleg@pobox.com
oleg@pobox.com
Mon, 5 Aug 2002 19:12:12 -0700 (PDT)
Carl McTague wrote:
> finAuDeterminize :: (Ord l) => FinAu l -> FinAu [l]
> finAuDeterminize startStates = [f startStates]
> where f :: (Ord l) => [FaState l] -> FaState [l]
> f states = FaState label' acceptQ' trans'
> where label' = sort (map label states)
> acceptQ' = or (map acceptQ states)
> trans' = map (singleton.f.concat) (transpose (map trans states))
> singleton a = [a]
> Now, we get to the heart of my question. This definition is quite
> lovely, recursive and lazy, but it doesn't actually capture the
> recurrent structure of the resulting automata. That is, if we were to
> compute:
> finAuAcceptStringQ (finAuDeterminize dom18) somestring
> then finAuDeterminize would keep spinning out, generating more and
> more states, which would actually be equivalent (and, have identical
> state labels).
To prevent such a behavior we ought to keep track of the states that
we have already computed.
Before getting to the solution, I'd like to take the liberty to
modify your definition for the FSM to improve readability (at the
expense of efficiency). I'd like to be able to print automata, to see
what exactly I have determinized.
data FaState l = FaState {label :: l, acceptQ :: Bool, trans :: [[l]]}
deriving(Eq,Show)
-- states start states
data FA l = FA [(l,FaState l)] [l] deriving Show
The transition function is now the list of labels rather than the list
of states. A naive NFA recognizing a sample 0*(0(0+1)*)* can be written
as
dom18 = FA [(1,one), (2,two)] [1]
where one = FaState 1 True [[1,2],[]]
two = FaState 2 True [[2,1],[2,1]]
(BTW, I had trouble with the automaton in the original article. That
NFA recognized [1], which it should not). The acceptance function can
be written as
finAuAcceptStringQ :: (Eq l) => FA l -> [Int] -> Bool
finAuAcceptStringQ (FA all_states start_labels) str =
or [acceptP (locate_state l) str | l <- start_labels]
where locate_state l = let (Just state) = lookup l all_states in state
acceptP (FaState _ acceptQ _) [] = acceptQ
acceptP (FaState _ _ trans) (a:as) =
or [ acceptP (locate_state l) as | l <- followerStates ]
where followerStates =
if a > (length trans) - 1 then []
else trans!!a
The determinizer can then be written as follows:
finAuDeterminize (FA all_states start_labels) = FA all_states' [start_label']
where
-- [NFA_labels] -> DFA_labels
det_labels = sort . nub
start_label' = det_labels start_labels
-- DFA states
all_states' = build_states [start_label'] []
-- build_states labels seen_labels
-- lazily builds the list of DFA states. 'labels' is the list
-- of labels for the states to build. seen_labels are the labels
-- of the states we have built already
build_states [] seen_labels = []
build_states (l:labels) seen_labels =
if l `elem` seen_labels then -- already built DFA state l
build_states labels seen_labels
else really_build l labels seen_labels
-- Build a DFA state with a label 'dfa_label' and return a list
-- of the constructed state and a promise to build the follower states
really_build dfa_label labels seen_labels =
(dfa_label,dfa_state):
build_states new_labels (dfa_label:seen_labels)
where
dfa_state = FaState dfa_label acceptQ' trans'
locate_state l = let (Just state) = lookup l all_states in state
nfa_states = map locate_state dfa_label
acceptQ' = or (map acceptQ nfa_states)
trans' = map ((:[]).det_labels.concat) $
transpose $ map trans nfa_states
new_labels = (concat trans') ++ labels
test1' = finAuAcceptStringQ (finAuDeterminize dom18) [0,1,0,1]
test2' = finAuAcceptStringQ (finAuDeterminize dom18) [1,0]
The states of the output DFA are built lazily. Each state is constructed
only once. To see that equivalent states are not duplicated, we can
check the result of determinizing the dom18: finAuDeterminize dom18
prints
FA [([1], FaState{label=[1], acceptQ=True, trans=[[[1,2]],[[]]]}),
([1,2], FaState{label=[1,2],acceptQ=True, trans=[[[1,2]],[[1,2]]]}),
([], FaState{label=[], acceptQ=False,trans=[]})]
[[1]] -- the starting state
which is indeed a DFA (and so happens, the minimal one).