[Haskell-cafe] I solved my reversible parsing problem (was: How to do reversible parsing?)

Askar Safin safinaskar at mail.ru
Sat Jul 10 22:49:04 UTC 2021


I found solution to problem I stated in this letter: https://mail.haskell.org/pipermail/haskell-cafe/2021-January/133275.html . And now I will describe it.

I considered combinator libraries for reversible parsing, such as this: https://hackage.haskell.org/package/invertible-syntax . And I rejected them, because they don't check grammar for ambiguity statically. I. e. they cannot check whether grammar is ambiguous before seeing input.

So I wrote this: https://paste.debian.net/1204012/ (tested with ghc 8.10.4, kleene-0.1, lattices-2.0.2, regex-applicative-0.3.4, QuickCheck-2.14.2, containers-, check-cfg-ambiguity- [this is my package!], Earley-, deepseq-

What is this? This is reversible lexer (function "lexer") and context-free parser (function "contextFree"). They are similar to alex+happy (or flex+bison). They are dynamic, i. e. they accept dynamic description of target language. And both functions output pair (parser, printer).

In case of lexing these two functions are guaranteed to satisfy these equations (slightly simplified):

parse :: text -> Maybe tokens
print :: tokens -> Maybe text
parse s == Just a ==> print a /= Nothing
print a == Just s ==> parse s == Just a

In case of context-free parsing and printing these functions satisfy these equations (slightly simplified):

parse :: tokens -> [ast]
print :: ast -> Maybe tokens
a `elem` parse s ==> print a /= Nothing
print a == Just s ==> a `elem` parse s

Additionally, "contextFree" performs heuristic ambiguity checking, so in most cases there is no more that one possible parse.

"lexer" parses using regular expressions (package "kleene") using usual "maximal munch" rule. "contextFree" takes CFG and parses using Earley algorithm from package "Earley". Function "contextFree" outputs AST as opposed to parse tree, i. e. "1 + (2 * 3)" and "1 + 2 * 3" give same output.

My library is dynamic, i. e. it is designed for case, when you don't know your target language at compilation time. For example, my library is perfectly suited to case of Isabelle's inner language, described in original mail. In more popular case (i. e. when target language is fixed) my library will not be so pleasant to deal with, because "contextFree" always outputs same AST type named "AST". I. e. you will need some additional stage to convert this universal AST type to your target type. This problem could be fixed in some hypothetical library based on Template Haskell, which will take grammar description in compile-time and generate needed code.

I will not give additional details. Feel free to explore code. Assume this code as public domain.

In original letter I stated 5 goals. So, here is their status:
1. Solved
2. Mostly solved (lexer is unambiguous, context free stage is checked using heuristic algorithm)
3. Solved
4. Solved
5. Solved (unfortunately, my solution is always dynamic, i. e. it perfectly fits "Isabelle's inner language" case, but more "normal" cases require boilerplate)

If you like this library, it is possible you will like my other libraries:
- https://hackage.haskell.org/package/check-cfg-ambiguity - checks CFG for ambiguity
- https://hackage.haskell.org/package/parser-unbiased-choice-monad-embedding - another parsing library. It doesn't support reversible parsing. It is designed for static case, i. e. for case when you know your target language beforehand. Check it out to see its difference from other libs

Askar Safin

More information about the Haskell-Cafe mailing list