[Haskell-cafe] [Parsec] A combinator to match between M and N times?

Chris Kuklewicz haskell at list.mightyreason.com
Tue Aug 29 19:07:08 EDT 2006


Udo Stenzel wrote:
> Chris Kuklewicz wrote:
>> Again, Parsec requires you to put "try" where you need it
> 
> I'm pretty sure it does, although this
> 
>> Udo Stenzel wrote:
>>> countBetween 0 n p = p <:> countBetween   0   (n-1) p <|> return []
> 
> is a place where it's not needed in general. 

No, see my example below.  It is not needed in the specific case where p can 
only consume a single token.

 > You should know what
> you're doing, though.  And I like ReadP better in general, exactly
> because 'try' is a bit trippy.

And I have not used ReadP enough to be good with it.

> 
> 
> Udo.

As long as you test with 'p' that consumes a single token, there is no 
difference using try

(<:>) = liftM2 (:)

countBetween 0 0 _ = return []
countBetween 0 n p = p <:> countBetween   0   (n-1) p <|> return []
countBetween m n p = p <:> countBetween (m-1) (n-1) p


Expand

countBetween 0 n p = (do value_p <- p
                          value_rest <- countBetween   0   (n-1) p
                          return (value : value_rest)
                  <|> (return [])

If p fails without consuming tokens, then the <|> runs (return []).
If p fails after consuming tokens, then the <|> will fail instead of running 
(return []).

By replacing p with (try p) this will work.

> test = countBetween 3 5 (string "ab")
> go = runParser test () "" "abababac"

Without try this fails to parse with an error message:
Left (line 1, column 7):
unexpected "c"
expecting "ab"

With the try this succeeds:
Right ["ab","ab","ab"]

I assume you think the former is problem.  If you do not then we are trying to 
define different operations.

The problem is that string "ab" consumes the 'a' character/token before it 
fails.  The 'try' wraps this failure, pretending p did not consume any input, so 
the <|> will then run (return []) as we want.

It is not needed in the line
countBetween m n p = p <:> countBetween (m-1) (n-1) p
because if p fails then the whole operation should fail.

There is a benefit to this difficult behavior! The <|> choice point for 
backtracking is expensive.  By committing to the first branch when it consumes a 
token, the Parsec monad can completely forget the other branch of <|> which 
allows it to be garbage collected.  This default is handy for preventing memory 
leaks, but it has to be overridden by 'try'.  One can package <|> and 'try' in 
useful combinators such as countBetween so the use of them becomes slightly easier.

-- 
Chris


More information about the Haskell-Cafe mailing list