[Haskell-cafe] Re: Frisby grammars that have context
apfelmus
apfelmus at quantentunnel.de
Tue May 29 18:22:53 EDT 2007
Isaac Dupree wrote:
> apfelmus wrote:
>> Mark T.B. Carroll wrote:
>>> I've been playing with Text.Parsers.Frisby to see how it stacks against
>>> other options and, while it's been great so far, I am finding that I
>>> can't encode a grammar where what's acceptable depends on what's already
>>> been parsed in some nontrivial way.
>>> [...]
>>> Is this supposed to not be possible in Frisby, or (quite likely) am I
>>> missing something that allows me to?
>> It's intentionally impossible. Frisby uses a dynamic programming
>> approach that crucially depends on the fact that the grammar in question
>> is context-free (actually something related, but the effect is the
>> same).
>
> Is that dependence crucial? What if it gained Monad operations that just
> weren't intended to be used very often, and maybe locally harmed
> performance a little where they are used?
Now that you ask, I become unsure :) The actual details of packrat
parsing are written down in
B. Ford. Packrat Parsing: Simple, Powerful, Lazy, Linear Time.
http://pdos.csail.mit.edu/~baford/packrat/icfp02/
There's a small section about "Monadic Packrat Parsing" but I'm not sure
about its significance. The following discussion may shed light on this.
First a different explanation of packrat parsing. It can be understood
as a variant of the O(n^3) Cocke, Younger, Kasami parsing algorithm for
context-free grammars (coincidentially recently discussed at
http://article.gmane.org/gmane.comp.lang.haskell.cafe/22850). First, we
rearrange the table
gs i j nt = substring starting at position j of length i
can be derived by nonterminal nt
= function of gs i' j' nt' for j'>=j, j+i>=j'+i'
and any nt' ¹
as
fs j nt = [i | the substring starting at j with a length i
is a derivation from the nonterminal nt]
Then, packrat parsing basically throws out non-determinism (which
changes the semantics of the context-free grammar):
packrat j nt = minimum (fs j nt)
= function of (packrat j' nt') for j'>=j
and any nt' ¹
and that's about it. In the aforementioned paper, this table is very
implicit but it's there: the indices i and j are present as memory
pointers to different incarnations of the data structure Derivs. Also,
the constructed values (i.e. values of type a for P s a) are stored in
the table.
Now, declaring a parser in the Frisby library builds up the table
structure. Every newRule introduces a new non-terminal and an associated
column in the table. This means that at least every non-terminal must be
"side-effect free", i.e. the result (packrat j nt) may only depend on
the substring starting at j but not on the results from previous parses.
But it seems that the dependence itself indeed may incorporate
context-sensitive behavior. In other words, you may decide freely how
(packrat j nt) is calculated from (packrat j' nt'). In particular, you
can choose the j' to incorporate based on parsing resulsts from them.
Here's an artificial pseudo-code example:
weird <- newRule $ do
b <- parse boolean
if b == True
then parse number
else parse date
-- assumed helper stuff
boolean <- newRule $ ...
number <- newRule $ ...
date <- newRule $ ...
The decision of constructing the result of the nonterminal 'weird' from
parsing a date or parsing a number depends on whether we parsed True or
False before. In this case, there are no unexpected run-time penalities
and it appears that this can already be implemented using the bare-hands
machinery from the paper but that this cannot be implemented in Frisby (?).
However, it's not possible to assign non-terminals to the parts that
parse differently depending on context. In the example, we cannot factorize
weird <- newRule $ do
b <- parse boolean
parse (helper b)
helper <- newRule $ \b -> do
if b == True
then parse number
else parse date
and assign 'helper' a non-terminal. Somehow, this apparently doesn't
work anyway because newRule doesn't accept functions as arguments. So,
it seems that we can just turn (P s) into a monad without regret!
Run-time penalities should only occur if we recurse on something that
didn't get memoized via newRule. In a sense, is 'newRule' the only
primitive (should probably get a different name like 'memoize') that
makes packrat parsing different and faster than other monadic parsers?
> BTW: (P s) should be an instance of Applicative (which is already
> possible with Frisby's current code, just not there) (I prefer the
> aesthetics of Frisby-applicative code to many of the combinators it
> provides - I tried it a little sometime, not enough to send a darcs patch)
Yes, I think too that it definitely should be made an instance of
Applicative. For parsing, I prefer that to Monad anyway :)
Regards,
apfelmus
¹ The >= sometimes must be > to avoid a <<loop>>, but that's immaterial
here.
More information about the Haskell-Cafe
mailing list