[Haskell-cafe] Re: CYK-style parsing and laziness
apfelmus
apfelmus at quantentunnel.de
Sat May 26 04:57:45 EDT 2007
Steffen Mazanek wrote:
> apfelmus wrote
>> The key point of the dynamic programming algorithm is indeed to memoize
>> the results gs i j for all pairs of i and j. In other words, the insight
>> that yields a fast algorithm is that for solving the subproblems gs i j
>> (of which there are n^2), solution to smaller subproblems of the same
>> form are sufficient. Tabulation is a must.
>
> I underst and this, however I thought it would be possible to use the
> automatic collapsing of the termgraphs in some way.
Well, some things have to be left to the programmer :), especially the
choice of trading space for time.
Note that there are very systematic and natural ways to derive dynamic
programming algorithms in functional languages. In a sense, much of the
work of R. Bird centers this topic. The book "Algebra of Programming"
http://web.comlab.ox.ac.uk/oucl/research/pdt/ap/pubs.html#Bird-deMoor96:Algebra
is one of the cornerstones.
The systematic derivation of dynamic programming algorithms has been
rediscovered in a more direct but less general fashion
http://bibiserv.techfak.uni-bielefeld.de/adp/
> Of course, you can still choose how to represent the table. There's a
>> nice higher order way to do that
>>
>> tabulate :: (Int -> Int -> a) -> (Int -> Int -> a)
>>
>> gs = tabulate gs'
>> where
>> gs' 1 j = ... uses gs x y for some x y ...
>> gs' i j = ... ditto ...
>
>
> Thank you for this explanation. Your approach is not very concise either
> but it does not pollute the algorithm so much.
Oh? It's concise enough for me :) The nice thing about an explicit
'tabulate' is that you can separate the table and the entry calculations
completely.
> That would be strange. I mean, no gs i j may have more than two
>> elements, namely S or A. The other key point of the CYK algorithm is
>> that the sets gs i j are indeed sets and may only contain as many
>> elements as there are nonterminals.
>
>
> .... You are right, of course. I have tried a nub before the list
> comprehension however this is evaluated too late.
Yes, the nub has to eliminate storing a nonterminal "for every k". But
this can be done in advance by noting that we're only interested in
whether there exists at least one k
[nt | ..., not $ null [k | k<-[1..i-1],
a `elem` gs k j,
b `elem` gs (i-k) (j+k)]]
and don't want to emit a nonterminal for which k
[nt | ..., k <- [1..i-1],
a `elem` gs k j,
b `elem` gs (i-k) (j+k)]
> I should really use sets, however, I would miss the list comprehension
> syntactic sugar sooo much. Is there something similar for real Data.Set?
Not that I knew of, but you can always use
fromList [nt | ...]
Note that the Data.Set can be thought of as being part of the
tabulation. In effect, you really deal with a 3-dimensional truth table
gs i j nt = substring starting at j of length i can be derived
by nonterminal nt
Here's an implementation of the tabulation with Data.Set
tabulate :: (Int -> Int -> NT -> Bool) -> (Int -> Int -> NT -> Bool)
tabulate gs' = \i j nt -> nt `member` table ! (i,j)
where
table = array bnds [(ij, mkSet $ gs' i j) | ij@(i,j) <- range bnds]
mkSet chi = fromList [nt | nt <- nts, chi nt]
And here's the memoized function
gs = tabulate gs'
where
gs' 1 j nt = any [True | Left t <- productions nt,
w !! (j-1) == t]
gs' i j nt = any [True | Right (a,b) <- productions nt,
gs k j a,
gs (i-k) (j+k) b]
It assumes a function (dependent on the grammer at hand)
productions :: NT -> [Either T (NT, NT)]
that returns the terminal and nonterminal productions for a given
nonterminal.
Regards,
apfelmus
More information about the Haskell-Cafe
mailing list