[Haskell-cafe] Alternative instance for non-backtracking parsers

Isaac Elliott isaace71295 at gmail.com
Fri Aug 24 12:20:46 UTC 2018


Throughout the day I've been thinking about how to best respond to this
thread, but I've no satisfying answer so I'm just going to throw out my
thoughts on the matter.

Non-backtracking choice is adequate for many parsing tasks. Applicative
with (non-backtracking) Alternative roughly corresponds to the LL(k) class
of grammars, which can be parsed very quickly with low memory overhead
(precisely because they lack any ability to backtrack). Because it is
sufficiently expressive as well as very efficient, non-backtracking choice
is a good default for LL-style parser combinators.

Backtracking has a performance cost, but we shouldn't have to pay a price
for things we don't use. If I were to implement an LL(k) grammar using a
parser combinator library that had always-backtracking choice, I would
still take a penalty even though I know that this parser will never
backtrack.

With 'try', we get opt-in backtracking. Pay for what you use. I can
implement my LL(1) grammar and have it nice and speedy, and you can
implement a complex grammar that needs to backtrack by using 'try' when
appropriate.

> Of course the example above can be mitigated by moving the char '-'
outside the alternative. But we might not have control over whether the two
parsers share a common prefix

In a majority of use cases we can do the kind of factoring you suggest. But
if you really, definitely can't, then LL-style parser combinators may be
inappropriate for the task. I'd suggest looking at PEGs and packrat
parsing, which can deal with the examples you laid out.

And regarding Paul's most recent reply:

I think one needs to better understand how these parsers work before they
can comment on "whether parser combinators are good for serious parsing
tasks". I don't consider the things mentioned in this thread "problems",
nor their solutions to be "tricks". If certain behaviours are surprising,
it's because there is a lack of understanding of the tool, and not an issue
with the tool itself.

Also, trifecta has the same backtracking semantics as megaparsec.

On Fri, Aug 24, 2018 at 8:10 PM Paul <aquagnu at gmail.com> wrote:

> may be because I always hit similar problems and needs different tricks
> with them, so I an not sure that parser combinators are good solution.
> Btw, usually I use standard parser combinator which is related to GHC
> itself (in the 'base' package), so I'm not sure how all those P.C.
> libraries are similar (may be MegaParsec has different behavior). But
> it's my IMHO: I'm not sure that P.C. libraries are good for serious
> parsing tasks: you should be concentrated on the task and not to spend
> time in solving of such surprises like that.
>
> Btw, some time ago I found this list of libs
>
> https://www.reddit.com/r/haskell/comments/46u45o/what_is_the_current_state_of_parser_libraries_in/
> and Trifecta looks interesting but I did not try it
>
>
> 24.08.2018 12:44, Doaitse Swierstra wrote:
> >
> >> Op 24 aug. 2018, om 7:33  heeft Paul <aquagnu at gmail.com> het volgende
> geschreven:
> >>
> >>>   Of course the example above can be mitigated by moving the char '-'
> outside the alternative.
> >> Right, this works fine:
> >>
> >>    pq :: Parsec Void String Bool
> >>    pq = do
> >>      char '-'
> >>      (q <|> p)
> >>
> >> which for me, means that Haskell has not real backtracking.
> > You are confusing the language “Haskell” with the library “Parsec”.
> Parsec is just one of the many parser combinator libraries.
> >
> >> Btw, `Alternative` is not backtracking (as well as List permutations).
> > The class “Alternative” does in no way specify how parsers should
> behave; it defines an interface. One might expect <|> to be associative,
> but even that is in no way guaranteed.
> >
> >> After "do" should happen context saving. I think it's possible to be
> implemented manually... As workaround, may be to pick char '-', but don't
> consume it (and to fail if it is not expecting '-') ? Or to switch to
> another parser libs: arrow based, happy, bindings to good known libs... For
> me, parsers combinators absolutely ever are big problem: a lot of tricks,
> low performance, may be it's typical for any "elegant" approach.
> >   - Some libraries do not need “try” constructs to set backtrack points,
> but may be a bit slower; other libraries are more greedy and a bit faster.
> >   - Some parsers keep track of where you are in the input at some cost,
> others just parse as fast as possible without having to update line and
> position counters (handy when you know the input is correct since it was
> machine-generated).
> >   - Some libraries provide nice error messages about what was expected
> at some cost, other libraries just report that they did not find a parse.
> >   - Some libraries try to “repair” the input and continue parsing when
> they cannot proceed otherwise, some others just stop when they get stuck.
> >   - Some libraries hang on to the input so they may backtrack to a point
> far back, some libraries work in an online way and pursue all parses “in
> parallel”, i.e. they perform a breadth first search for a successful parse.
> >   - Some libraries analyse the grammar before parsing starts and warn
> for non-sensical parsers, others just do not.
> >   - Some libraries already return part of the computed result during the
> parsing process -and thus exhibit online behaviour-, other libraries
> (usually the monadic ones) construct a huge closure describing the result,
> which only becomes available when a succesful parse was found.
> >   - Some libraries analyse a grammar for left-recursion and transform
> them into an equivalent non-left-recusrive one at some costs.
> >   - Some libraries memoise the parsing, and may be faster, but in return
> cannot handle real monadic constructs.
> >   - Etc., Etc.
> >
> > My impression is that with respect to parser combinators you just jump a
> bit too fast to a conclusion.
> >
> > Doaitse
> >
> >
> >>
> >> 23.08.2018 22:41, Olaf Klinke wrote:
> >>> Dear Cafe,
> >>>
> >>> can anyone explain the use of an Alternative instance for a parser
> monad where the parser (p <|> q) does not backtrack automatically when p
> fails?
> >>> Consider megaparsec for example. When p is a parser that does not
> backtrack automatically, then (p <|> q) never succeeds with q. While I do
> understand that fine-grained control of backtracking is desirable for
> performance reasons in general, it escapes me why such an Alternative
> instance could be useful. See also "A note on backtracking" in the
> parser-combinators package documentation [1].
> >>>
> >>> Minimal code example and context below.
> >>>
> >>> Thanks,
> >>> Olaf
> >>>
> >>> [1]
> http://hackage.haskell.org/package/parser-combinators-1.0.0/docs/Control-Applicative-Combinators.html
> >>>
> >>> import Text.Megaparsec
> >>> import Text.Megaparsec.Char
> >>> import Data.Void
> >>>
> >>> p :: Parsec Void String Bool
> >>> p = do
> >>>    char '-'
> >>>    string "True"
> >>>    return True
> >>> q :: Parsec Void String Bool
> >>> q = do
> >>>    char '-'
> >>>    string "False"
> >>>    return False
> >>>
> >>>>>> parseTest (p <|> q) "-True"
> >>> True
> >>>>>>   parseTest (p <|> q) "-False"
> >>> 1:2:
> >>> unexpected "Fals"
> >>> expecting "True"
> >>> -- Since q can parse "-False", I would expect (p <|> q) to parse
> "-False", too.
> >>>
> >>> Context:
> >>> Of course the example above can be mitigated by moving the char '-'
> outside the alternative. But we might not have control over whether the two
> parsers share a common prefix. I am trying to write some code generic in
> the parser monad and I am having difficulties making my code work for both
> backtracking and non-backtracking parsers. If possible, I'd like to keep my
> parser type class devoid of a 'try' combinator, which would be the identity
> function for parser monads that always backtrack.
> >>> _______________________________________________
> >>> Haskell-Cafe mailing list
> >>> To (un)subscribe, modify options or view archives go to:
> >>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> >>> Only members subscribed via the mailman list are allowed to post.
> >> _______________________________________________
> >> Haskell-Cafe mailing list
> >> To (un)subscribe, modify options or view archives go to:
> >> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> >> Only members subscribed via the mailman list are allowed to post.
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20180824/c025d434/attachment.html>


More information about the Haskell-Cafe mailing list