[Haskell-cafe] Re: Frisby grammars that have context

apfelmus apfelmus at quantentunnel.de
Tue May 29 18:22:53 EDT 2007


Isaac Dupree wrote:
> apfelmus wrote:
>> Mark T.B. Carroll wrote:
>>> I've been playing with Text.Parsers.Frisby to see how it stacks against
>>> other options and, while it's been great so far, I am finding that I
>>> can't encode a grammar where what's acceptable depends on what's already
>>> been parsed in some nontrivial way.
>>> [...]
>>> Is this supposed to not be possible in Frisby, or (quite likely) am I
>>> missing something that allows me to?
>> It's intentionally impossible. Frisby uses a dynamic programming
>> approach that crucially depends on the fact that the grammar in question
>> is context-free (actually something related, but the effect is the
>> same).
> 
> Is that dependence crucial? What if it gained Monad operations that just
> weren't intended to be used very often, and maybe locally harmed
> performance a little where they are used?

Now that you ask, I become unsure :) The actual details of packrat
parsing are written down in

   B. Ford. Packrat Parsing: Simple, Powerful, Lazy, Linear Time.
   http://pdos.csail.mit.edu/~baford/packrat/icfp02/

There's a small section about "Monadic Packrat Parsing" but I'm not sure
about its significance. The following discussion may shed light on this.


First a different explanation of packrat parsing. It can be understood
as a variant of the O(n^3) Cocke, Younger, Kasami parsing algorithm for
context-free grammars (coincidentially recently discussed at
http://article.gmane.org/gmane.comp.lang.haskell.cafe/22850). First, we
rearrange the table

  gs i j nt = substring starting at position j of length i
              can be derived by nonterminal nt
            = function of gs i' j' nt' for j'>=j, j+i>=j'+i'
              and any nt' ¹

as

  fs j nt = [i | the substring starting at j with a length i
                 is a derivation from the nonterminal nt]

Then, packrat parsing basically throws out non-determinism (which
changes the semantics of the context-free grammar):

  packrat j nt = minimum (fs j nt)
               = function of (packrat j' nt') for j'>=j
                 and any nt' ¹

and that's about it. In the aforementioned paper, this table is very
implicit but it's there: the indices i and j are present as memory
pointers to different incarnations of the data structure Derivs. Also,
the constructed values (i.e. values of type a for P s a) are stored in
the table.


Now, declaring a parser in the Frisby library builds up the table
structure. Every newRule introduces a new non-terminal and an associated
column in the table. This means that at least every non-terminal must be
"side-effect free", i.e. the result (packrat j nt) may only depend on
the substring starting at j but not on the results from previous parses.
But it seems that the dependence itself indeed may incorporate
context-sensitive behavior. In other words, you may decide freely how
(packrat j nt) is calculated from (packrat j' nt'). In particular, you
can choose the j' to incorporate based on parsing resulsts from them.
Here's an artificial pseudo-code example:

  weird   <- newRule $ do
    b <- parse boolean
    if b == True
       then parse number
       else parse date

    -- assumed helper stuff
  boolean <- newRule $ ...
  number  <- newRule $ ...
  date    <- newRule $ ...

The decision of constructing the result of the nonterminal 'weird' from
parsing a date or parsing a number depends on whether we parsed True or
False before. In this case, there are no unexpected run-time penalities
and it appears that this can already be implemented using the bare-hands
machinery from the paper but that this cannot be implemented in Frisby (?).

However, it's not possible to assign non-terminals to the parts that
parse differently depending on context. In the example, we cannot factorize

  weird   <- newRule $ do
    b <- parse boolean
    parse (helper b)

  helper  <- newRule $ \b -> do
    if b == True
       then parse number
       else parse date

and assign 'helper' a non-terminal. Somehow, this apparently doesn't
work anyway because newRule doesn't accept functions as arguments. So,
it seems that we can just turn (P s) into a monad without regret!
Run-time penalities should only occur if we recurse on something that
didn't get memoized via newRule. In a sense, is 'newRule' the only
primitive (should probably get a different name like 'memoize') that
makes packrat parsing different and faster than other monadic parsers?

> BTW: (P s) should be an instance of Applicative (which is already
> possible with Frisby's current code, just not there) (I prefer the
> aesthetics of Frisby-applicative code to many of the combinators it
> provides - I tried it a little sometime, not enough to send a darcs patch)

Yes, I think too that it definitely should be made an instance of
Applicative. For parsing, I prefer that to Monad anyway :)


Regards,
apfelmus

¹ The >= sometimes must be > to avoid a <<loop>>, but that's immaterial
here.



More information about the Haskell-Cafe mailing list