[Haskell-beginners] Parsing 'A's and then ('A's or 'B's)

Mon Jan 25 09:15:13 UTC 2016

On Sun, Jan 24, 2016 at 10:56:31PM +0100, Simon Jakobi wrote:
> >     if I understood your specification correctly, there would be multiple
> > ways to parse the string "AAA":
> >
> >     - 3 'x' elements ("A", "A", "A")
> >     - 2 'x' elements ("AA", "A")
> >     - 2 'x' elements again (first one shorter) ("A", "AA")
> >     - 1 'x' element ("AAA")
> >
> 
> There would be even more ways because 'y', too, can represent one or more
> 'A's.
>
> [...]
>
> Is this due to attoparsec not being able to "backtrack" (not sure if
> this is the right term)? Is backtracking something that parsers generally
> are incapable of?

Ah, indeed you are right. attoparsec, parsec and friends handle failure
with `try`. From Attoparsec documentation:

    Attempt a parse, but do not consume any input if the parse fails.

One way to deal with cases like yours is for every parser to compute a
"list of successes". Crude example:

    import Text.Parsec
    import Text.Parsec.String

    foo :: Parser [String]
    foo = anyChar   >>= \h ->
          (foo <|> e) >>= \t ->
          return ([""] ++ map (h:) t)
        where
              e = return [""]

    -- λ> parseTest foo "bar"
    -- ["","b","ba","bar"]

Then you can chain those with `try`/`choice` and compute your result(s)
(I guess using the list monad to handle the mechanism could do).

Ambiguous grammars are an age old problem, and some searching [1] leads
me to believe there are already viable solution in Haskell.

[1] http://stackoverflow.com/questions/13279087/parser-library-that-can-handle-ambiguity