[Haskell-cafe] Re: [Parsec] Backtracking with try does not work for me?

Chris Kuklewicz haskell at list.mightyreason.com
Tue Aug 1 04:41:40 EDT 2006


Stephane Bortzmeyer wrote:
> On Mon, Jul 31, 2006 at 06:51:27PM +0100,
>  Chris Kuklewicz <haskell at list.mightyreason.com> wrote 
>  a message of 102 lines which said:
> 
>> minilang = do
>>       char 'a'
>>       optional (try (do {comma ; char 'b'}))
>>       optional (do {comma ; char 'c'})
>>       eof
>>       return "OK"
> 
> I now have a new problem which was hidden beneath. If the language
> authorizes "a,bb" and "a,bbc", "a,bbc" is not accepted by my parser
> since it already accepted "a,bb" and the "c" which is left triggers a
> syntax error.
> 
> This time, "try" believes it succeeded but should not. I need more
> look-ahead but I'm not sure how?

The problem is mentioned here:

http://www.cs.uu.nl/people/daan/download/parsec/parsec.html#notFollowedBy

Your whole parser is indeed failing, and again it is because of the "failing 
after consuming some input" issue.  For "a,bbc" your "bb" token parser consumes 
the "bb" and then the dangling "c" causes the error.

So you cannot commit to consuming the "bb" unless you know the rest of the 
string is okay.  There are a few ways to accomplish this.  The first would be to 
test whether "bb" is followed by "eof" or "comma" before accepting it.  Another 
solution is to try and parse what follows "bb" before accepting "bb".

A small fix would look like:

> minilang' = do
>        string "a"
>        optional (try $ do {comma ; string "bb"; endToken})
>        optional (do {comma ; string "bbc"})
>        eof
>        return "OK"
>   where endToken = eof <|> lookAhead (comma >> return ())

A more general fix looks like this:

> stringLang :: [String] -> GenParser Char st [String]
> stringLang items = polyLang comma (map string items)
> 
> listLang :: [Char] -> GenParser Char st [Char]
> listLang items = polyLang comma (map char items)
> 

The first version of polyLang uses the "test eof or comma before accepting" 
strategy:

> polyLang :: (Show element,Show token) => GenParser element state ignore 
>          -> [GenParser element state token] -> GenParser element state [token]
> polyLang _ [] = eof >> return []
> polyLang separator input = (use input) <|> polyLang separator (tail input)
>   where use (opX:xs) = do 
>           (x,test) <- try (do x <- opX
>                               test <- more
>                               when test (separator >> return ())
>                               return (x,test))
>           rest <- if test then (loop xs <|> unexpected ("(problem after "++show x++")")) 
>                           else return []
>           return (x:rest)
>         more = option True (eof >> return False)
>         loop [] = (unexpected "cannot parse")
>         loop input' = use input' <|> loop (tail input')

The second version polyLang' uses the "test rest of input before accepting" 
strategy:

> polyLang' :: (Show element,Show token) => GenParser element state ignore 
>           -> [GenParser element state token] -> GenParser element state [token]
> polyLang' _ [] = eof >> return []
> polyLang' separator input = (use input) <|> polyLang' separator (tail input)
>   where use (opX:xs) = try (do x <- opX
>                                test <- more
>                                rest <- if test
>                                          then separator >> (loop xs <|> unexpected ("(problem after "++show x++")"))
>                                          else return []
>                                return (x:rest))
>         more = option True (eof >> return False)
>         loop [] = (unexpected "cannot parse")
>         loop input' = use input' <|> loop (tail input')

It works:

> *Main> run (stringLang ["a","bb","bbc"]) "a,bbc"
> ["a","bbc"]

The error reporting gets a bit strange, and is different between the two 
versions of polyLang'

> *Main> run (polyLang comma (map string ["a","bb","bbc","dd"])) "a,bbc,bb"
> parse error at (line 1, column 7):
> unexpected cannot parse or (problem after "bbc")
> expecting "dd"

> *Main> run (polyLang' comma (map string ["a","bb","bbc","d"])) "a,bbc,bb"
> parse error at (line 1, column 1):
> unexpected "c", cannot parse, (problem after "bbc"), (problem after "a") or "a"
> expecting end of input, ",", "dd", "bb" or "bbc"



More information about the Haskell-Cafe mailing list