[Haskell-cafe] uu-parsinglib - Greedy Parser
Marco Vassena
marco.vax91 at gmail.com
Tue Jan 13 10:57:22 UTC 2015
In the uu-parsinglib the choice combinator (<|>) is symmetric and does not commit to any alternative.
If the grammar being parsed is ambiguous, the corresponding parser will fail at run time on an ambiguous input.
Consider for instance this html parser.
pTag = pOpenTag <|> pCloseTag <|> pCommentTag <|> pContent
pContent = Content <$> some (satisfy (/= '<'))
pHtml = some pTag
This parser will fail on "<a>123</a>", because this may be interpreted as:
[ [Open a , Content "123", Close a],
[Open a, Content "12", Content "3", Close a],
[Open a, Content "1", Content "2", Content "3", Close a] ... ]
In other parsing libraries such as parsec the choice operator <|> is greedy and commits to the first alternative that makes
any progress, so that some is greedy and Content "123" would be parsed.
The operator <<|> in uu-parsinglib has the same greedy behaviour.
I would like to disambiguate the grammar so that the first result ([Open a, Content "123", Close a]) is selected (longest matching rule).
I suppose that this may be done using <<|> to define a greedySome to be used in in pContent, however I am wondering whether
there is another way to do this, without using such operator <<|>.
Any help is appreciated.
All the best,
Marco
More information about the Haskell-Cafe
mailing list