[Haskell-cafe] Parser left recursion
S. Doaitse Swierstra
doaitse at swierstra.net
Thu Feb 21 11:11:03 CET 2013
As mentioned before, the way to handle this specific problem is to use either the pChainl or pChainr parser combinators, as e.g. found on:
http://hackage.haskell.org/packages/archive/uu-parsinglib/2.7.4.1/doc/html/Text-ParserCombinators-UU-Derived.html
and many similar libraries. So one can write:
pExpr = pChainl ( (+) <$ pSym ' ')) pFactor
pFactor = iI '(' pExpr ')' Ii <|> pInteger <|> pIdentifier
What is even nicer is that one can easily extend this to deal with many different operators:
pExpr = foldr nextop [((+),'+'), ((*), '*'))] pGactor
where nextop (sem,sym) = pChainl sem <$ pSym sym))
It is obvious how to extend this further into operators with the same priority or being right associative. See furthermore:
@inproceedings{Fokker95:0,
title = {Functional Parsers},
author = {Jeroen Fokker},
year = {1995},
tags = {parsing},
researchr = {http://dutieq.st.ewi.tudelft.nl/publication/Fokker95%3A0},
cites = {0},
citedby = {0},
pages = {1-23},
booktitle = {Advanced Functional Programming, First International Spring School on Advanced Functional Programming Techniques, Båstad, Sweden, May 24-30, 1995, Tutorial Text},
editor = {Johan Jeuring and Erik Meijer},
volume = {925},
series = {Lecture Notes in Computer Science},
publisher = {Springer},
isbn = {3-540-59451-5},
}
Most left recursion stems from the fact that conventional CFG notation is sufficient, but unfortunately not ideally suited, to express oft occurring patterns. This is where parser combinators come in: they allow one to express what one wants to say instead of having to encode it using recursion, etc.
If you have a really nasty grammar where left recursion removal by hand would ruin your grammar, you may use a transform like the LeftCornerTransform as used e.g. in the ChristmasTree package, which removes the problem of exponential time behaviour of reading Haskell data types with infix operators. See: http://hackage.haskell.org/package/ChristmasTree-0.2, and which has been described in:
@article{DBLP
:journals/entcs/BaarsSV10,
author = {Arthur I. Baars and
S. Doaitse Swierstra and
Marcos Viera},
title = {Typed Transformations of Typed Grammars: The Left Corner
Transform},
journal = {Electr. Notes Theor. Comput. Sci.},
volume = {253},
number = {7},
year = {2010},
pages = {51-64},
ee = {http://dx.doi.org/10.1016/j.entcs.2010.08.031},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Doaitse
On Feb 20, 2013, at 8:13 , Martin Drautzburg <Martin.Drautzburg at web.de> wrote:
> Hello all,
>
> this was previously asked on haskell-beginners, but only partially answered.
>
> As an exercise I am writing a parser roughly following the expamples in Graham
> Hutton's book. The language contains things like:
>
> data Exp = Lit Int -- literal integer
> | Plus Exp Exp
>
> My naive parser enters an infinite recursion, when I try to parse "1+2". I do
> understand why:
>
> "hmm, this expression could be a plus, but then it must start with an
> expression, lets check".
>
> and it tries to parse expression again and again considers Plus.
>
> Twan van Laarhoven told me that:
>
>> Left-recursion is always a problem for recursive-descend parsers.
>
> and suggested to do something like:
>
>> parseExp = do
>> lit <- parseLit
>> pluses <- many (parsePlusToken *> parseLit)
>> return (combinePlusesWithLit lit pluses)
>>
>> combinePlusesWithLit = foldr Plus -- or foldl
>
> This indeed does the trick, but only when the first token is a Lit (literal
> integer).
>
> I then added the possibility to optionally put things in parentheses. But then
> I cannot parse "(1+2)+3". The original code fails, because "(1+2)" is not a
> Lit and when I allow an expression as the first argument to "+" I get infinite
> recursion again.
>
> I am generally confused, that saying "a plus expression is an integer followed
> by many "plus somethings" is not what the language says. So this requires a
> lot of "paying attention" to get right. I'd much rather say "a plus expression
> is two expressions with a '+' in between".
>
> I do know for sure, that it is possible to parse "(1+2)+3" (ghci does it just
> fine). But I seem to be missing a trick.
>
> Can anyone shed some light on this?
>
> --
> Martin
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list