[Haskell-cafe] Pearl! I just proved theorem about impossibility of monad transformer for parsing with (1) unbiased choice and (2) ambiguity checking before running embedded monadic action (also, I THREAT I will create another parsing lib)

Fri Jun 11 22:18:30 UTC 2021

Hi. I hope people here, at haskell-cafe, like various theorem, pearls on parsing, monads, monad transformers and anything like that, right? So, I just proved something interesting for you!
Theorem about impossibility of monad transformer for parsing with (1) unbiased choice and (2) ambiguity checking before running embedded monadic action.
I already presented it on #haskell in Libera chat, now let me send it here. This is not quite theorem, i. e. I would not say that it keeps with mathematical standards of rigor.
---------------------------
I want monad transformer (let's name it ParserT) with following properties (hint: I proved such transformer doesn't exist):

- it has unbiased choice (i. e. (<|>) :: ParserT m a -> ParserT m a -> ParserT m a) (this means running parser can give you multiple results)
- you can run parser without executing embedded monadic actions. i. e. "run" function should look like this:

run :: ParserT m a -> String -> [m a] {- Not m [a] -}

I want such function, because I want to do this: run parser, then count number of possible parses, then (if count is equal to one) actually execute embedded monadic action. I. e. I want something like this:

parser :: ParserT (Either String) ()
parser = .....

main :: IO ()
main = do {
  input <- ....;
  case run parser input of {
    [res] -> do {
      -- Cool! we got exactly one result! Let's run it
      case res of {
        Left e -> putStr $ "Parsing succeed, but we have semantic error: " ++ e;
        Right () -> putStr "Everything OK";
      };
    };
    _ -> putStr "Cannot parse. 0 parse trees or >1 parse trees";
  };
}

Now let me state theorem.

Theorem. Such transformer doesn't exist. Moreover, even if I replace arbitrary monad "m" with particular monad "Maybe" (or "Either String") the task is still unsolvable. But the task seems achievable using arrows

-------------------------------------

Proof. Let's imagine such "ParsecT Maybe" exists. Let's imagine we have such function:

char :: Char -> ParserT Maybe Char {- matches supplied char -}

Also, keep in mind we have "lift":

lift :: Maybe a -> ParserT Maybe a

Now let's write this:

parseA = char 'a'
parseAorA = char 'a' <|> char 'a'
contr = do {
  a <- lift $ Nothing;
  if a > 5 then parseA else parseAorA;
  return ();
}

Now, please, say, what is (run contr "a")? How many elements will (run contr "a") have? If "a > 5", then (run contr "a") will contain one element. If "a <= 5", then (run contr "a") will contain two elements. But we cannot know whether "a > 5", because we cannot extract "a" from "Nothing". So, we got contradiction. QED
-----------------
In other words, function "run" with type "ParserT m a -> String -> [m a]" means that we should somehow be able to know count of parses before executing monadic action. But this is impossible, because our parser can contain code like this: (do { a <- lift $ ...; if a then ... else ...; }). I. e. count of parses can depend on value, embedded in inner monad, so we cannot know count of parses before we run embedded inner monadic action.

This theorem is not simply out of curiosity. I actually need parsing lib with this properties.

1. I need unbiased choice, because it is compatible with CFGs, and I love CFGs (as opposed to PEG)
2. I need embedding monads, because this allows me to check semantic errors (such as "type mismatch") in time of parsing. I. e. I can simply embed (Either String) monad into parsing monad, and use this (Either String) monad for checking semantic errors
3. I need to know number of parse trees before executing monadic actions (i. e. i need ([m a]) as opposed to (m [a])), because I want to check that number of parses is equal to 1 (if not, then print error), and then extract inner monad (which will contain information about semantic errors). In other words, I want check the input for ambiguity (and print error if it is ambiguous), and if it is not ambiguous, then print possible semantic errors (if any). The possibility of ambiguity detection is whole point of using CFGs as opposed to PEGs (which are always unambiguous, but this unambiguity is simply trick).
4. Every terminal and nonterminal should carry location info (i. e. begin and end of this string). And this location info should be passed implicitly. And it should not be embedded deeply in ASTs I will use. (This means that "happy" is not for me. As well as I understand it cannot implicitly pass location info. In https://github.com/ghc/ghc/blob/master/compiler/GHC/Parser.y I see explicit location management all the time.)
5. Left recursion is not needed

So, does such lib exist? Well, I already know such lib should be arrow based (as opposed to monad). So, is there such arrow parsing lib? Embedding arbitrary inner monad is not needed, just embedding (Either String) will go.

If there is no such lib, I plan to create my own. This is threat! :)

Also, I wrote letter long ago asking for parsing lib with different properties. I included desire for reversibility, among others. Well, this was other task. For my current task I don't need reversibility.

==
Askar Safin
http://safinaskar.com
https://sr.ht/~safinaskar
https://github.com/safinaskar