[Haskell-cafe] Parser left recursion

Thu Feb 21 11:11:03 CET 2013

As mentioned before, the way to handle this specific problem is to use either  the pChainl or pChainr parser combinators, as e.g. found on:

http://hackage.haskell.org/packages/archive/uu-parsinglib/2.7.4.1/doc/html/Text-ParserCombinators-UU-Derived.html

and many similar libraries. So one can write:

pExpr = pChainl ( (+) <$ pSym ' ')) pFactor
pFactor = iI '(' pExpr ')' Ii <|> pInteger <|> pIdentifier 

What is even nicer is that one can easily extend this to deal with many different operators:

pExpr = foldr nextop [((+),'+'), ((*), '*'))] pGactor
        where nextop (sem,sym) = pChainl sem <$ pSym sym))

It is obvious how to extend this further into operators with the same priority or being right associative. See furthermore:

@inproceedings{Fokker95:0,
  title = {Functional Parsers},
  author = {Jeroen Fokker},
  year = {1995},
  tags = {parsing},
  researchr = {http://dutieq.st.ewi.tudelft.nl/publication/Fokker95%3A0},
  cites = {0},
  citedby = {0},
  pages = {1-23},
  booktitle = {Advanced Functional Programming, First International Spring School on Advanced Functional Programming Techniques, Båstad, Sweden, May 24-30, 1995, Tutorial Text},
  editor = {Johan  Jeuring and Erik Meijer},
  volume = {925},
  series = {Lecture Notes in Computer Science},
  publisher = {Springer},
  isbn = {3-540-59451-5},
}

Most left recursion stems from the fact that conventional CFG notation is sufficient, but unfortunately not ideally suited, to express oft occurring patterns. This is where parser combinators come in: they allow one to express what one wants to say instead of having to encode it using recursion, etc.

If you have a really nasty grammar where left recursion removal by hand would ruin your grammar, you may use a transform like the LeftCornerTransform as used e.g. in the ChristmasTree package, which removes the problem of exponential time behaviour of reading Haskell data types with infix operators. See: http://hackage.haskell.org/package/ChristmasTree-0.2, and which has been described in:

@article{DBLP
:journals/entcs/BaarsSV10,
  author    = {Arthur I. Baars and
               S. Doaitse Swierstra and
               Marcos Viera},
  title     = {Typed Transformations of Typed Grammars: The Left Corner
               Transform},
  journal   = {Electr. Notes Theor. Comput. Sci.},
  volume    = {253},
  number    = {7},
  year      = {2010},
  pages     = {51-64},
  ee        = {http://dx.doi.org/10.1016/j.entcs.2010.08.031},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Doaitse

On Feb 20, 2013, at 8:13 , Martin Drautzburg <Martin.Drautzburg at web.de> wrote:

> Hello all,
> 
> this was previously asked on haskell-beginners, but only partially answered.
> 
> As an exercise I am writing a parser roughly following the expamples in Graham 
> Hutton's book. The language contains things like:
> 
> data Exp = Lit Int -- literal integer
>         | Plus Exp Exp
> 
> My naive parser enters an infinite recursion, when I try to parse "1+2". I do 
> understand why:
> 
> "hmm, this expression could be a plus, but then it must start with an 
> expression, lets check". 
> 
> and it tries to parse expression again and again considers Plus.
> 
> Twan van Laarhoven told me that:
> 
>> Left-recursion is always a problem for recursive-descend parsers.
> 
> and suggested to do something like:
> 
>>    parseExp = do
>>      lit <- parseLit
>>      pluses <- many (parsePlusToken *> parseLit)
>>      return (combinePlusesWithLit lit pluses)
>> 
>>    combinePlusesWithLit = foldr Plus -- or foldl
> 
> This indeed does the trick, but only when the first token is a Lit (literal 
> integer). 
> 
> I then added the possibility to optionally put things in parentheses. But then  
> I cannot parse "(1+2)+3". The original code fails, because "(1+2)" is not a 
> Lit and when I allow an expression as the first argument to "+" I get infinite 
> recursion again.
> 
> I am generally confused, that saying "a plus expression is an integer followed 
> by many "plus somethings" is not what the language says. So this requires a 
> lot of "paying attention" to get right. I'd much rather say "a plus expression 
> is two expressions with a '+' in between".
> 
> I do know for sure, that it is possible to parse "(1+2)+3" (ghci does it just 
> fine). But I seem to be missing a trick.
> 
> Can anyone shed some light on this?
> 
> -- 
> Martin
> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe