[Haskell-cafe] Parser left recursion

Sun Feb 24 16:04:11 CET 2013

Hi,

Kim-Ee Yeoh wrote:
> Perhaps you meant /productive/ corecursion? Because the definition "A
> ::= B A" you gave is codata.

If you write a recursive descent parser, it takes the token stream as an 
input and consumes some of this input. For example, the parser could 
return an integer that says how many tokens it consumed:

   parseA :: String -> Maybe Int
   parseB :: String -> Maybe Int

Now, if we implement parseA according to the grammar rule

   A ::= B A

we have, for example, the following:

   parseA text
     = case parseB text of
         Nothing -> Nothing
         Just n1 -> case parseA (drop n1 text) of
           Nothing -> Nothing
           Just n2 -> Just (n1 + n2)

Note that parseA is recursive. The recursion is well-founded if (drop n1 
text) is smaller then text. So we have two cases, as Roman wrote:

If the language defined by B contains the empty string, then n1 can be 
0, so the recursion is not well-founded and the parser might loop.

If the language defined by B does not contain the empty string, then n1 
is always bigger than 0, so (drop n1 text) is always smaller than text, 
so the recursion is well-founded and the parser cannot loop.

So I believe the key to understanding Roman's remark about well-founded 
recursion is to consider the token stream as an additional argument to 
the parser.

However, the difference between hidden left recursion and unproblematic 
recursion in grammars can also be understood in terms of productive 
corecursion. In that view, a parser is a process that can request input 
tokens from the token stream:

   data Parser
     =  Input (Char -> Parser)
     |  Success
     |  Failure

   parseA :: Parser
   parseB :: Parser

Now, if we implement parseA according to the grammar rule

   A ::= B A

we have, for example, the following:

   andThen :: Parser -> Parser -> Parser
   andThen (Input f) p = Input (\c -> f c `andThen` p)
   andThen (Success) p = p
   andThen Failure p = p

   parseA = parseB `andThen` parseA

Note that parseA is corecursive. The corecursion is productive if the 
corecursive call to parseA is guarded with an Input constructor. Again, 
there are two cases:

If the language described by B contains the empty word, then parseB = 
Success, and (parseB `andThen` parseA) = parseA, so the corecursive call 
to parseA is not guarded and the parser is not productive.

If the langauge described by B does not contain the empty word, then 
parseB = Input ..., and (parseB `andThen` parseA) = Input (... parseA 
...), so the corecursive call to parseA is guarded and the parse is 
productive.

So I believe the key to understanding left recursion via productive 
corecursion is to model the consumption of the token stream with a 
codata constructor.

Both approaches are essentially equivalent, of course: Before 
considering the very same nonterminal again, we should have consumed at 
least one token.

   Tillmann