[Haskell-cafe] Parser left recursion
Tillmann Rendel
rendel at informatik.uni-marburg.de
Sun Feb 24 16:04:11 CET 2013
Hi,
Kim-Ee Yeoh wrote:
> Perhaps you meant /productive/ corecursion? Because the definition "A
> ::= B A" you gave is codata.
If you write a recursive descent parser, it takes the token stream as an
input and consumes some of this input. For example, the parser could
return an integer that says how many tokens it consumed:
parseA :: String -> Maybe Int
parseB :: String -> Maybe Int
Now, if we implement parseA according to the grammar rule
A ::= B A
we have, for example, the following:
parseA text
= case parseB text of
Nothing -> Nothing
Just n1 -> case parseA (drop n1 text) of
Nothing -> Nothing
Just n2 -> Just (n1 + n2)
Note that parseA is recursive. The recursion is well-founded if (drop n1
text) is smaller then text. So we have two cases, as Roman wrote:
If the language defined by B contains the empty string, then n1 can be
0, so the recursion is not well-founded and the parser might loop.
If the language defined by B does not contain the empty string, then n1
is always bigger than 0, so (drop n1 text) is always smaller than text,
so the recursion is well-founded and the parser cannot loop.
So I believe the key to understanding Roman's remark about well-founded
recursion is to consider the token stream as an additional argument to
the parser.
However, the difference between hidden left recursion and unproblematic
recursion in grammars can also be understood in terms of productive
corecursion. In that view, a parser is a process that can request input
tokens from the token stream:
data Parser
= Input (Char -> Parser)
| Success
| Failure
parseA :: Parser
parseB :: Parser
Now, if we implement parseA according to the grammar rule
A ::= B A
we have, for example, the following:
andThen :: Parser -> Parser -> Parser
andThen (Input f) p = Input (\c -> f c `andThen` p)
andThen (Success) p = p
andThen Failure p = p
parseA = parseB `andThen` parseA
Note that parseA is corecursive. The corecursion is productive if the
corecursive call to parseA is guarded with an Input constructor. Again,
there are two cases:
If the language described by B contains the empty word, then parseB =
Success, and (parseB `andThen` parseA) = parseA, so the corecursive call
to parseA is not guarded and the parser is not productive.
If the langauge described by B does not contain the empty word, then
parseB = Input ..., and (parseB `andThen` parseA) = Input (... parseA
...), so the corecursive call to parseA is guarded and the parse is
productive.
So I believe the key to understanding left recursion via productive
corecursion is to model the consumption of the token stream with a
codata constructor.
Both approaches are essentially equivalent, of course: Before
considering the very same nonterminal again, we should have consumed at
least one token.
Tillmann
More information about the Haskell-Cafe
mailing list