[Haskell-cafe] Re: [Parsec] Backtracking with try does not work for
me?
Chris Kuklewicz
haskell at list.mightyreason.com
Tue Aug 1 04:41:40 EDT 2006
Stephane Bortzmeyer wrote:
> On Mon, Jul 31, 2006 at 06:51:27PM +0100,
> Chris Kuklewicz <haskell at list.mightyreason.com> wrote
> a message of 102 lines which said:
>
>> minilang = do
>> char 'a'
>> optional (try (do {comma ; char 'b'}))
>> optional (do {comma ; char 'c'})
>> eof
>> return "OK"
>
> I now have a new problem which was hidden beneath. If the language
> authorizes "a,bb" and "a,bbc", "a,bbc" is not accepted by my parser
> since it already accepted "a,bb" and the "c" which is left triggers a
> syntax error.
>
> This time, "try" believes it succeeded but should not. I need more
> look-ahead but I'm not sure how?
The problem is mentioned here:
http://www.cs.uu.nl/people/daan/download/parsec/parsec.html#notFollowedBy
Your whole parser is indeed failing, and again it is because of the "failing
after consuming some input" issue. For "a,bbc" your "bb" token parser consumes
the "bb" and then the dangling "c" causes the error.
So you cannot commit to consuming the "bb" unless you know the rest of the
string is okay. There are a few ways to accomplish this. The first would be to
test whether "bb" is followed by "eof" or "comma" before accepting it. Another
solution is to try and parse what follows "bb" before accepting "bb".
A small fix would look like:
> minilang' = do
> string "a"
> optional (try $ do {comma ; string "bb"; endToken})
> optional (do {comma ; string "bbc"})
> eof
> return "OK"
> where endToken = eof <|> lookAhead (comma >> return ())
A more general fix looks like this:
> stringLang :: [String] -> GenParser Char st [String]
> stringLang items = polyLang comma (map string items)
>
> listLang :: [Char] -> GenParser Char st [Char]
> listLang items = polyLang comma (map char items)
>
The first version of polyLang uses the "test eof or comma before accepting"
strategy:
> polyLang :: (Show element,Show token) => GenParser element state ignore
> -> [GenParser element state token] -> GenParser element state [token]
> polyLang _ [] = eof >> return []
> polyLang separator input = (use input) <|> polyLang separator (tail input)
> where use (opX:xs) = do
> (x,test) <- try (do x <- opX
> test <- more
> when test (separator >> return ())
> return (x,test))
> rest <- if test then (loop xs <|> unexpected ("(problem after "++show x++")"))
> else return []
> return (x:rest)
> more = option True (eof >> return False)
> loop [] = (unexpected "cannot parse")
> loop input' = use input' <|> loop (tail input')
The second version polyLang' uses the "test rest of input before accepting"
strategy:
> polyLang' :: (Show element,Show token) => GenParser element state ignore
> -> [GenParser element state token] -> GenParser element state [token]
> polyLang' _ [] = eof >> return []
> polyLang' separator input = (use input) <|> polyLang' separator (tail input)
> where use (opX:xs) = try (do x <- opX
> test <- more
> rest <- if test
> then separator >> (loop xs <|> unexpected ("(problem after "++show x++")"))
> else return []
> return (x:rest))
> more = option True (eof >> return False)
> loop [] = (unexpected "cannot parse")
> loop input' = use input' <|> loop (tail input')
It works:
> *Main> run (stringLang ["a","bb","bbc"]) "a,bbc"
> ["a","bbc"]
The error reporting gets a bit strange, and is different between the two
versions of polyLang'
> *Main> run (polyLang comma (map string ["a","bb","bbc","dd"])) "a,bbc,bb"
> parse error at (line 1, column 7):
> unexpected cannot parse or (problem after "bbc")
> expecting "dd"
> *Main> run (polyLang' comma (map string ["a","bb","bbc","d"])) "a,bbc,bb"
> parse error at (line 1, column 1):
> unexpected "c", cannot parse, (problem after "bbc"), (problem after "a") or "a"
> expecting end of input, ",", "dd", "bb" or "bbc"
More information about the Haskell-Cafe
mailing list