[Haskell-cafe] uu-parsinglib - Greedy Parser

Marco Vassena marco.vax91 at gmail.com
Tue Jan 13 10:57:22 UTC 2015


In the uu-parsinglib the choice combinator (<|>) is symmetric and does not commit to any alternative.
If the grammar being parsed is ambiguous, the corresponding parser will fail at run time on an ambiguous input.

Consider for instance this html parser.

pTag = pOpenTag <|> pCloseTag <|> pCommentTag <|> pContent

pContent = Content <$> some (satisfy (/= '<'))
pHtml = some pTag

This parser will fail on "<a>123</a>", because this may be interpreted as:
[ [Open a , Content "123", Close a], 
 [Open a, Content "12", Content "3", Close a],
 [Open a, Content "1", Content "2", Content "3", Close a] ... ]

In other parsing libraries such as parsec the choice operator <|> is greedy and commits to the first alternative that makes
any progress, so that some is greedy and Content "123" would be parsed.
The operator <<|> in uu-parsinglib has the same greedy behaviour.

I would like to disambiguate the grammar so that the first result ([Open a, Content "123", Close a]) is selected (longest matching rule).
I suppose that this may be done using <<|> to define a greedySome to be used in in pContent, however I am wondering whether 
there is another way to do this, without using such operator <<|>.

Any help is appreciated.

All the best,
Marco


More information about the Haskell-Cafe mailing list