[Haskell-cafe] How to do reversible parsing?

Mario blamario at rogers.com
Mon Jan 4 18:23:30 UTC 2021


That's one ambitious research program you've taken on. I'm not aware of 
any complete solution, nor even anything close to it. I can only provide 
a couple of potentially useful pieces. I'm the author of the 
grammatical-parsers and construct Haskell libraries. The former can 
handle left-recursive grammars, the latter is bidirectional. I hope you 
can find some inspiration there.

[1] http://hackage.haskell.org/package/grammatical-parsers
[2] http://hackage.haskell.org/package/construct


On 2021-01-02 6:59 p.m., Askar Safin via Haskell-Cafe wrote:
> Hi. How to do deterministic reversible parsing in Haskell? Please, point me to some libs or tools.
> Or give some advice on how to write such lib or tool. (Also, I am interested in solutions for C or C++.)
>
> Now let me explain my task in detail.
>
> I am interested in formal math. I am going to develop various proof checkers. In particular, I want
> to develop my PTS checker ( https://en.wikipedia.org/wiki/Pure_type_system ). Here is example of
> existing PTS checker: http://web.archive.org/web/20180924133846/http://www.cs.kun.nl/~janz/yarrow/ .
> Here is example of input this checker accepts (taken from manual):
>
> Var (+) : Nat -> Nat -> Nat
> Def double := \x:Nat. x+x
> Def S := (+) one
>
> Also, I want to develop something similar to Isabelle ( https://isabelle.in.tum.de/ ). Example of
> Isabelle input: https://isabelle.in.tum.de/dist/library/HOL/HOL/HOL.html .
>
> Also, I am going to write converters between various languages. For example, between input for some
> existing PTS checker and my PTS checker.
>
> In short, I want to write lot of parsers and pretty-printers for various languages. And thus I need
> some parsing solution.
>
> The solution should have the following features:
>
> 1. The parsing should be divided into lexing and parsing. I. e. usual lex + yacc model. Parsing should
> be done using usual context-free grammars (CFGs). Left-recursive grammars should be supported (this
> leaves out parser combinator libraries). All languages I am going to deal with fit this model. I. e.
> I will deal with simple to parse languages, such as Pascal. Not complex ones, like C or C++.
>
> 2. Parsing should be deterministic. I. e. parser should check that CFG is unambiguous. Yes, I know
> that this task is impossible to do on Turing machine. But at very least some partial solution will
> go. For example, some algorithm, which will check that some reasonable set of CFGs is deterministic.
> Or even brute force up to some limit will go. (Yes, this will not give absolutely correct results,
> but this will go.) This is example of grammar for PTS in one of my checkers (start symbol is t0):
>
> t1000 = id
> t1000 = "(" t0")"
> t999 = t999 t1000
> t3 = t4 "::" t3
> t3 = "%" id "::" t0 "." t3
> t0 = "!!" id "::" t0 "." t0
> t1 = t2 "==>" t1
> t0 = t1
> t1 = t2
> t2 = t3
> t3 = t4
> t4 = t999
> t999 = t1000
>
> If I remember correctly, this grammar is deterministic (checked using brute force up to a limit).
> I want solution which will verify that this grammar is deterministic.
>
> All stages of parsing should be proved as deterministic (i. e. scanner too).
>
> 3. Parsing should be reversible. I. e. pretty-printing should be supported, too. And the tool should
> guarantee that parsing and pretty-printing match. Now let me give more precise definition. Let's
> assume we have these functions:
>
> parse :: String -> Maybe X
> print :: X -> Maybe String
>
> As you can see, not every internal representation is printable. This is OK. For example, if internal
> representation contains variable names that happen to equal reserved words, then, of course, this is
> unprintable representation.
>
> The tool should guarantee that:
> a. Result of parsing should be printable, but printing not necessary should give same string. I. e.
> if "parse s == Just x", then "print x /= Nothing". (But not necessary "print x == Just s".)
> b. Parsing of result of printing should give same representation. I. e. if "print x == Just s", then
> "parse s == Just x"
>
> 4. Not only parsing should generate parse tree (i. e. exact representation of derivation), but also
> AST. AST differs from parse tree. For example, AST should not contain braces. I. e. "2 + (3 * 4)" and
> "2 + 3 * 4" should give same AST. And the tool should be able parse string to AST and pretty print
> AST back to string.
>
> 5. It should be possible to use CFG not known in compile time. Consider Isabelle language. Here is
> again example of Isabelle text: https://isabelle.in.tum.de/dist/library/HOL/HOL/HOL.html . As you can
> see the language is divided into inner language and outer language. Strings in inner language are
> quoted. Outer language can add productions to inner language. Inner language cannot change itself.
> "notation" and "translations" commands add productions to inner language. So inner language is not
> known beforehand, it is created dynamically. There is no need to dynamically create scanner, i. e.
> it should be possible to dynamically create parser (for inner language), but there is no such
> requirement for scanner. Also, parser will not change in the middle of parsing given string.
>
> Again: let's assume I write parser for my Isabelle analog. Scanner and parser for outer language is
> fixed. Scanner for inner language is fixed, too. But parser for inner language is not. I read input
> document and add productions to CFG of inner language on the fly. Every time I add production,
> resulting CFG should be checked for ambiguity. And when I find inner language string in input, I
> process it using just constructed parser for inner language.
>
> --
>
> Points 2 and 3 are the most important in this list. If there is solution which contains them alone,
> this already will be very good.
>
> Now let me list considered solutions.
>
> * Usual combinator libraries (such as parsec) will not go. They don't work with left-recursive CFGs.
> They don't check grammars for ambiguities. They don't do reversible parsing. But they allow creating
> language on the fly, and this is good thing.
>
> * Happy and alex will not go. As well as I know Happy can prove unambiguity of some very simple grammars
> (i. e. for some set of simple grammars it will not report any conflicts and thus will prove they unambiguity).
> It seems this is LR(1) set. But, as well as I know, Happy will report conflicts on PTS grammar given
> above, i. e. Happy cannot prove its unambiguity. Also, Happy and Alex cannot pretty-print. And Happy
> and Alex don't allow supplying CFG on the fly.
>
> * Package "earley". Handle arbitrary CFGs, including left-recursive. This is good thing. It seems CFG
> can be supplied in runtime. This is good thing, too. Gives all parse trees, thus we can verify that
> count of parse trees equals to 1. This is good thing, too. Unfortunately, earley cannot verify CFS's
> unambiguity before supplying input. Also, earley doesn't support pretty-printing.
>
> * PEG libraries. Will not go, I want CFGs.
>
> * I searched on Hackage reversible parsing libraries. Unfortunately, none of them guarantee
> reversibility. I. e. it is very easy to fool these libraries and create parsers and printers not
> satisfying equations I gave above. Also, this libraries doesn't have other features I mentioned
>
> * http://augeas.net/ is very cool project. Unlike nearly everything I found in internet, Augeas
> provides guaranteed reversible parsing and pretty-printing (!!!!!). Unfortunately, set of supported
> target languages is very limited. Regular languages are supported. And some very limited set of CFGs
> is supported. For example, it is possible to construct parser for JSON. But parser for Pascal seems
> impossible. Augeas not designed for parsing programming languages. Only simple configuration
> languages like JSON, XML, etc. Mutually recursive nonterminals are not supported. Also, Augeas
> provides C API, not Haskell one.
>
> * https://www.brics.dk/xsugar.html is very good. It provides guaranteed reversible translation
> between arbitrary language and XML. And XSugar even verifies unambiguity of the CFG using some
> sophisticated algorithm. Unfortunately, this algorithm still cannot prove unambiguity of PTS grammar
> given above. Also, XSugar is not Haskell library. It is standalone Java app. So, I need to convert
> some language to XML, then load this XML to my Haskell program and parse. Also, there is similar lib
> https://www.brics.dk/Xact.html from same author, but it is for Java, not for Haskell.
>
> I was able to find only 2 projects, which do guaranteed reversible parsing: mentioned Augeas and
> XSugar/Xact. Unfortunately, both are not for Haskell. And both cannot prove unambiguity for my PTS
> grammar.
>
> So, please tell me about solution. Or give advice how to write one.
>
> Also, let me describe my experience with various algorithms for checking ambiguity of grammar.
>
> 1. Try to construct LR(1) table. If there is no conflicts, then the grammar is unambiguous. It seems
> this is algorithm used by bison and happy. Unfortunately it is too weak and cannot prove unambiguity
> of my PTS grammar.
> 2. Try to construct LR(k) table. This algorithm is stronger than LR(1). I was unable to find
> implementation of this algorithm in any language. I don't want to write such algorithm.
> 3. Schmitz's algorithm. http://www.lsv.fr/~schmitz/pub/expamb.pdf . Even stronger than LR(k). I
> downloaded his bison fork. Unfortunately this bison fork can prove unambiguity for same set of
> grammars original bison can (i. e. LR(1)). I. e. it seems Schmitz did not implemented his algorithm.
> I don't want to implement his algorithm either.
> 4. Horizontal and vertical ambiguity checking. https://www.brics.dk/grammar/kruse.pdf . This is
> algorithm used in XSugar. Unfortunately, it doesn't handle my PTS grammar.
> 5. Brute force up to some limit. Doesn't give unambiguity guarantee. But I like it, and I will use
> it if I don't find anything else. It is the only algorithm which can handle my PTS grammar (and
> give enough amount of guarantee), not counting LR(k) and Schmitz's algorithm (I was not able to find
> implementations of both).
>
> I will try to write tool I want, I will write about my progress to this list.
>
> ==
> Askar Safin
> https://github.com/safinaskar
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.




More information about the Haskell-Cafe mailing list