[Haskell-cafe] How to do reversible parsing?

Mon Jan 4 19:55:49 UTC 2021

I poked a bit at this problem late last year[1][2]. I tried to go all the way to an abstraction for monoidal functors, but that might have been aiming too high. Instead, it could be worthwhile to create subclasses of Invariant[3] which stand in for Applicative/Divisible and Alternative/Decidable (and probably their unitless superclasses also). This would give you something like Pickler Combinators[4] on a foundation that looks a bit more like normal Applicative parsers.

The other thing I looked at last year was profunctors `P a a` where you use the same type variable in both positions. Instead of following the "monadic profunctors" work of Xia, Orchard, and Wang[5], I had some interesting discussions with Tom Ellis about expanding the hierarchy of the `product-profunctors` package[6][7][8]. (Which I really should revisit and help tie off.)

[1]: http://jackkelly.name/blog/archives/2020/08/19/abstracting_over_applicative_alternative_divisible_and_decidable
[2]: http://jackkelly.name/blog/archives/2020/11/03/profunctor_decoders_optical_decoders
[3]: https://hackage.haskell.org/package/invariant
[4]: https://www.microsoft.com/en-us/research/wp-content/uploads/2004/01/picklercombinators.pdf
[5]: https://poisson.chat/esop19/composing-bidir-prog-monadically.pdf
[6]: https://github.com/tomjaguarpaw/product-profunctors/issues/51
[7]: https://github.com/tomjaguarpaw/product-profunctors/pull/53
[8]: https://github.com/tomjaguarpaw/product-profunctors/pull/54

HTH,

-- Jack

January 5, 2021 4:23 AM, "Mario" <blamario at rogers.com> wrote:

> That's one ambitious research program you've taken on. I'm not aware of 
> any complete solution, nor even anything close to it. I can only provide 
> a couple of potentially useful pieces. I'm the author of the 
> grammatical-parsers and construct Haskell libraries. The former can 
> handle left-recursive grammars, the latter is bidirectional. I hope you 
> can find some inspiration there.
> 
> [1] http://hackage.haskell.org/package/grammatical-parsers
> [2] http://hackage.haskell.org/package/construct
> 
> On 2021-01-02 6:59 p.m., Askar Safin via Haskell-Cafe wrote:
> 
>> Hi. How to do deterministic reversible parsing in Haskell? Please, point me to some libs or tools.
>> Or give some advice on how to write such lib or tool. (Also, I am interested in solutions for C or
>> C++.)
>> 
>> Now let me explain my task in detail.
>> 
>> I am interested in formal math. I am going to develop various proof checkers. In particular, I want
>> to develop my PTS checker ( https://en.wikipedia.org/wiki/Pure_type_system ). Here is example of
>> existing PTS checker: http://web.archive.org/web/20180924133846/http://www.cs.kun.nl/~janz/yarrow .
>> Here is example of input this checker accepts (taken from manual):
>> 
>> Var (+) : Nat -> Nat -> Nat
>> Def double := \x:Nat. x+x
>> Def S := (+) one
>> 
>> Also, I want to develop something similar to Isabelle ( https://isabelle.in.tum.de ). Example of
>> Isabelle input: https://isabelle.in.tum.de/dist/library/HOL/HOL/HOL.html .
>> 
>> Also, I am going to write converters between various languages. For example, between input for some
>> existing PTS checker and my PTS checker.
>> 
>> In short, I want to write lot of parsers and pretty-printers for various languages. And thus I need
>> some parsing solution.
>> 
>> The solution should have the following features:
>> 
>> 1. The parsing should be divided into lexing and parsing. I. e. usual lex + yacc model. Parsing
>> should
>> be done using usual context-free grammars (CFGs). Left-recursive grammars should be supported (this
>> leaves out parser combinator libraries). All languages I am going to deal with fit this model. I.
>> e.
>> I will deal with simple to parse languages, such as Pascal. Not complex ones, like C or C++.
>> 
>> 2. Parsing should be deterministic. I. e. parser should check that CFG is unambiguous. Yes, I know
>> that this task is impossible to do on Turing machine. But at very least some partial solution will
>> go. For example, some algorithm, which will check that some reasonable set of CFGs is
>> deterministic.
>> Or even brute force up to some limit will go. (Yes, this will not give absolutely correct results,
>> but this will go.) This is example of grammar for PTS in one of my checkers (start symbol is t0):
>> 
>> t1000 = id
>> t1000 = "(" t0")"
>> t999 = t999 t1000
>> t3 = t4 "::" t3
>> t3 = "%" id "::" t0 "." t3
>> t0 = "!!" id "::" t0 "." t0
>> t1 = t2 "==>" t1
>> t0 = t1
>> t1 = t2
>> t2 = t3
>> t3 = t4
>> t4 = t999
>> t999 = t1000
>> 
>> If I remember correctly, this grammar is deterministic (checked using brute force up to a limit).
>> I want solution which will verify that this grammar is deterministic.
>> 
>> All stages of parsing should be proved as deterministic (i. e. scanner too).
>> 
>> 3. Parsing should be reversible. I. e. pretty-printing should be supported, too. And the tool
>> should
>> guarantee that parsing and pretty-printing match. Now let me give more precise definition. Let's
>> assume we have these functions:
>> 
>> parse :: String -> Maybe X
>> print :: X -> Maybe String
>> 
>> As you can see, not every internal representation is printable. This is OK. For example, if
>> internal
>> representation contains variable names that happen to equal reserved words, then, of course, this
>> is
>> unprintable representation.
>> 
>> The tool should guarantee that:
>> a. Result of parsing should be printable, but printing not necessary should give same string. I. e.
>> if "parse s == Just x", then "print x /= Nothing". (But not necessary "print x == Just s".)
>> b. Parsing of result of printing should give same representation. I. e. if "print x == Just s",
>> then
>> "parse s == Just x"
>> 
>> 4. Not only parsing should generate parse tree (i. e. exact representation of derivation), but also
>> AST. AST differs from parse tree. For example, AST should not contain braces. I. e. "2 + (3 * 4)"
>> and
>> "2 + 3 * 4" should give same AST. And the tool should be able parse string to AST and pretty print
>> AST back to string.
>> 
>> 5. It should be possible to use CFG not known in compile time. Consider Isabelle language. Here is
>> again example of Isabelle text: https://isabelle.in.tum.de/dist/library/HOL/HOL/HOL.html . As you
>> can
>> see the language is divided into inner language and outer language. Strings in inner language are
>> quoted. Outer language can add productions to inner language. Inner language cannot change itself.
>> "notation" and "translations" commands add productions to inner language. So inner language is not
>> known beforehand, it is created dynamically. There is no need to dynamically create scanner, i. e.
>> it should be possible to dynamically create parser (for inner language), but there is no such
>> requirement for scanner. Also, parser will not change in the middle of parsing given string.
>> 
>> Again: let's assume I write parser for my Isabelle analog. Scanner and parser for outer language is
>> fixed. Scanner for inner language is fixed, too. But parser for inner language is not. I read input
>> document and add productions to CFG of inner language on the fly. Every time I add production,
>> resulting CFG should be checked for ambiguity. And when I find inner language string in input, I
>> process it using just constructed parser for inner language.
>> 
>> --
>> 
>> Points 2 and 3 are the most important in this list. If there is solution which contains them alone,
>> this already will be very good.
>> 
>> Now let me list considered solutions.
>> 
>> * Usual combinator libraries (such as parsec) will not go. They don't work with left-recursive
>> CFGs.
>> They don't check grammars for ambiguities. They don't do reversible parsing. But they allow
>> creating
>> language on the fly, and this is good thing.
>> 
>> * Happy and alex will not go. As well as I know Happy can prove unambiguity of some very simple
>> grammars
>> (i. e. for some set of simple grammars it will not report any conflicts and thus will prove they
>> unambiguity).
>> It seems this is LR(1) set. But, as well as I know, Happy will report conflicts on PTS grammar
>> given
>> above, i. e. Happy cannot prove its unambiguity. Also, Happy and Alex cannot pretty-print. And
>> Happy
>> and Alex don't allow supplying CFG on the fly.
>> 
>> * Package "earley". Handle arbitrary CFGs, including left-recursive. This is good thing. It seems
>> CFG
>> can be supplied in runtime. This is good thing, too. Gives all parse trees, thus we can verify that
>> count of parse trees equals to 1. This is good thing, too. Unfortunately, earley cannot verify
>> CFS's
>> unambiguity before supplying input. Also, earley doesn't support pretty-printing.
>> 
>> * PEG libraries. Will not go, I want CFGs.
>> 
>> * I searched on Hackage reversible parsing libraries. Unfortunately, none of them guarantee
>> reversibility. I. e. it is very easy to fool these libraries and create parsers and printers not
>> satisfying equations I gave above. Also, this libraries doesn't have other features I mentioned
>> 
>> * http://augeas.net is very cool project. Unlike nearly everything I found in internet, Augeas
>> provides guaranteed reversible parsing and pretty-printing (!!!!!). Unfortunately, set of supported
>> target languages is very limited. Regular languages are supported. And some very limited set of
>> CFGs
>> is supported. For example, it is possible to construct parser for JSON. But parser for Pascal seems
>> impossible. Augeas not designed for parsing programming languages. Only simple configuration
>> languages like JSON, XML, etc. Mutually recursive nonterminals are not supported. Also, Augeas
>> provides C API, not Haskell one.
>> 
>> * https://www.brics.dk/xsugar.html is very good. It provides guaranteed reversible translation
>> between arbitrary language and XML. And XSugar even verifies unambiguity of the CFG using some
>> sophisticated algorithm. Unfortunately, this algorithm still cannot prove unambiguity of PTS
>> grammar
>> given above. Also, XSugar is not Haskell library. It is standalone Java app. So, I need to convert
>> some language to XML, then load this XML to my Haskell program and parse. Also, there is similar
>> lib
>> https://www.brics.dk/Xact.html from same author, but it is for Java, not for Haskell.
>> 
>> I was able to find only 2 projects, which do guaranteed reversible parsing: mentioned Augeas and
>> XSugar/Xact. Unfortunately, both are not for Haskell. And both cannot prove unambiguity for my PTS
>> grammar.
>> 
>> So, please tell me about solution. Or give advice how to write one.
>> 
>> Also, let me describe my experience with various algorithms for checking ambiguity of grammar.
>> 
>> 1. Try to construct LR(1) table. If there is no conflicts, then the grammar is unambiguous. It
>> seems
>> this is algorithm used by bison and happy. Unfortunately it is too weak and cannot prove
>> unambiguity
>> of my PTS grammar.
>> 2. Try to construct LR(k) table. This algorithm is stronger than LR(1). I was unable to find
>> implementation of this algorithm in any language. I don't want to write such algorithm.
>> 3. Schmitz's algorithm. http://www.lsv.fr/~schmitz/pub/expamb.pdf . Even stronger than LR(k). I
>> downloaded his bison fork. Unfortunately this bison fork can prove unambiguity for same set of
>> grammars original bison can (i. e. LR(1)). I. e. it seems Schmitz did not implemented his
>> algorithm.
>> I don't want to implement his algorithm either.
>> 4. Horizontal and vertical ambiguity checking. https://www.brics.dk/grammar/kruse.pdf . This is
>> algorithm used in XSugar. Unfortunately, it doesn't handle my PTS grammar.
>> 5. Brute force up to some limit. Doesn't give unambiguity guarantee. But I like it, and I will use
>> it if I don't find anything else. It is the only algorithm which can handle my PTS grammar (and
>> give enough amount of guarantee), not counting LR(k) and Schmitz's algorithm (I was not able to
>> find
>> implementations of both).
>> 
>> I will try to write tool I want, I will write about my progress to this list.
>> 
>> ==
>> Askar Safin
>> https://github.com/safinaskar
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
> 
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.