[Haskell-cafe] [Parsec] A combinator to match between M and N
haskell at list.mightyreason.com
Tue Aug 29 10:39:47 EDT 2006
Robert Dockins wrote:
> On Aug 29, 2006, at 9:11 AM, Tomasz Zielonka wrote:
>> On Tue, Aug 29, 2006 at 03:05:39PM +0200, Stephane Bortzmeyer wrote:
>>> Parsec provides "count n p" to run the parser p exactly n times. I'm
>>> looking for a combinator "countBetween m n p" which will run the
>>> parser between m and n times. It does not exist in Parsec.
>>> Much to my surprise, it seems quite difficult to write it myself and,
>>> until now, I failed (the best result I had was with the "option"
>>> combinator, which unfortunately requires a dummy value, returned when
>>> the parser fails).
>> How about this?
>> countBetween m n p = do
>> xs <- count m p
>> ys <- count (n - m) $ option Nothing $ do
>> y <- p
>> return (Just y)
>> return (xs ++ catMaybes ys)
>> Assuming n >= m.
>>> Does anyone has a solution? Preferrably one I can understand, which
>>> means not yet with liftM :-)
>> No liftM, as requested :-)
> Here's an interesting puzzle. For a moment, consider parsec only wrt
> its language-recognition capabilities.
> Then, we expect the count combinator to factor,
> count x p >> count y p === count (x+y) p
> where === mean "accepts the same set of strings".
> I somehow intuitively expect the countBetween combinator to factor in a
> similar way also, but it doesn't (at least, none of the posted versions
> do)! Note the output of:
> parser1 = countBetween 3 7 (char 'a') >> eof
> parser2 = countBetween 2 3 (char 'a') >> countBetween 1 4 (char 'a') >> eof
> main = do
> print $ parse parser1 "" "aaa"
> print $ parse parser2 "" "aaa"
> OK. What's happening is that the greedy nature of the combinator breaks
> things because parsec doesn't do backtracking by default. I'd expect to
> be able to insert 'try' in the right places to make it work. However,
> after playing around for a few minutes, I can't figure out any
> combination that does it. Is it possible to write this combinator so
> that it factors in this way?
My regex-parsec part of TextRegexLazy implements Greedy,Lazy,and Possessive
semantics for regular expressions using Parsec.
It is not obvious at first how to insert <|> and 'try'. You have to use a
continuation style. The above example could be simply done, however, as:
count 2 (char 'a')
choice [count 1 (char 'a') >> countBetween 1 4 (char 'a')
,countBetween 1 4 (char 'a')
This can be automated. A not-maximally efficient version would be:
cb m n p cont | m<=n =
do xs <- count m p
let rep 0 = return xs
rep i = do ys <- count i p
choice [ try (rep i >>= cont) | i <- [(n-m),(n-m)-1 .. 0] ]
test = cb 2 3 (string "ab") (\xs -> cb 1 4 (string "ab") (\ys -> return (xs,ys)))
go = runParser test () "" "abababac"
Where go now returns Right (["ab","ab"],["ab"])
More information about the Haskell-Cafe