[Haskell-cafe] Re: Fun with uu-parsinglib

Thu Jul 29 04:15:44 EDT 2010

On 29 jul 2010, at 05:04, David Place wrote:

> Hi, Doaitse.
> 
> I am making good progress transcribing my parser to use your library.   I think some ways that I have grown accustomed to working with Parsec will not work, though.   Now, I am getting the run time error:
> 
>>  Result: *** Exception: cannot compute minimal length of right hand side of monadic parser
> 
> Is there an explanation of this error in the documentation?  I am trying to combine with alternation some parsers which return the semantic data structures of my program.  Perhaps there are restrictions on the kind of parsers that can be combined with alternation.
> 
> Cheers,
> David
> ____________________
> David Place   
> Owner, Panpipes Ho! LLC
> http://panpipesho.com
> d at vidplace.com
> 
> 
> 

Dear David,

I am cc-ing this answer to Haskell-café since the answer may be of a wider interest.

Let me first explain a a sequence of design choices I made:

1) my parser combinators perform error correction by trying to insert missing tokens into and to delete superfluous tokens from the input
2) of course there are many possibilities to do so 
3) hence I implemented a search process which currently prunes the tree of possible corrective choices up to depth three
4) this process may return several solutions with the same overall cost and I have to pick one of those
5) in case of a draw I give preference to those alternatives which delete something, so progress is guaranteed
6) at the end of the input this does not work, since there is nothing to delete
7) if I pick an arbitrary alternative this may lead to a choice where we get an infinite sequence of insertions; suppose e.g. we want to insert an expression, and the system picks the if_alternative. If it does so it inserts an if-token and then the process starts all over again by trying to insert a conditional expression, which in general will be an expression, which will lead to the insertion of another if-token, etc.
8) in order to prevent this behaviour internally an abstract interpretation is made which computes the minimal number of tokens a parser will recognise (and thus can insert). The recursive alternatives will get an infinite length,  and will thus never be chosen unless you grammar is well-formed and all choices only can lead to an infinite sequence of insertions
9) if a choice has to be made for the correction process always a finite alternative is chosen, so the correction process is guaranteed to complete

Now the monads come in; they raise the question how to compute the minimal length of a parser WHICH DEPENDS ON A PREVIOUSLY RECOGNISED part of the input. In general this is impossible. So I made the decision to generate an error message in such cases, since I did not expect many people to run into this.

But now you, and probably many people used to writing parsers in the monadic (i.e. Parsec) style turn up, and you continue to use the monadic style bacuse you have become accustomed to a different way of writing parsers. Let me give an example, based on the following grammar for well-formed nested parentheses: S -> (S)S ; epsilon. We want to return e.g. the maximal nesting depth. This is one of the functions in the Examples.hs file in the uu-parsinglib. The prototypical way of writing this in applicative style, without using monads is:

wfp :: Parser Int
wfp =  max . (+1) <$> pParens  wfp <*> wfp 
      `opt` 
       0

Now let us take a look at the monadic formulation:

wfp  = do lt <- pParens wfp
          rt <-         wfp   -- < second call to wfp
          return ((lt + 1) `max` rt)
       <|> return 0

Now we see that the second call to wfp is within the scope of the binding of lt, and hence may depend on it. In this case it does not, but unfortunately the abstract interpretation cannot see this, and thus cannot do very much.

This shows in my view the bad thing about monadic parser combinators: they prohibit the self-analysis of parsers, they inhibit optimisations, and they inhibit feedback based on the analysis. More on this below.

Since I assume that the monadic formulation is only used as a last resort, i.e. when a parser REALLY depends on previous input, and that this will in general not be part of an alternative construct, I have decided to generate the above error message.

Another solution I could have chosen is to make the choice in the correction process based on the order in which the alternatives occur in the program, but this would create the need to explain to users that the order of their alternatives matters, and it is one of the nice things that thus far it does not. In the old uulib the order even gets completely lost because we internally build tables which assist in speeding up the parsing process.

There are several solutions now:

a) I just remove the error message and make an educated guess, leading to situations where users might get stuck in an infinite correction process (only at the end of the file; normally in case of a draw preference is given to delete something in order to guarantee progress), and then start to ask questions about what is going on. Notice that they will not get feedback except probably a stack overflow.
b) We leave it like this assuming that when people use a monadic style this will not be in the branch of an alternative construct
c) We add an extra combinator which removes the error message, and we require the user to indicate that a specific alternative should never be chosen in the insertion process
d) Internally I check to see  whether more than  specific number of insertions occur in a row, and then stop. This will however make parsing a bit more expensive, which I always try to avoid.

Let me now come back to explain g further why I think the monadic style should be avoided where-ever possible because they prohibit abstract interpretation.

If in the uu-parsinlib you write a parser like: pList (pList (pSym 'a')) you will get an an error message telling that this parser does not make sense, since the outer pList gets as an argument a parser which can accept the empty string. The good news is that this is something we can statically recognise, EVEN IN THE CASE we use monadic parsers. When we have p >>= q then either p is not empty, and if it is we know the value it returns and we can use this to see whether the q part may return empty. Notice that this is an analysis we have to do only once for each parser, and hence costs can be almost neglected.

My conclusion is: avoid monadic parsers as much as possible, and only use them where you really want to continue parsing with a parser which really depends on the input, like in:

n_as = do n <- pInt
          pEaxct (pSym 'a')

pExact 0 p = pReturn []
pExact n p = (:) <$> p <*> pExact (n-1) p

I have explained this observation before in my lecture notes for the Advanced Functional Programming summer school in 1996, section 5.2:

@inproceedings{SwieDupo96,
	Author = {Swierstra, S. D. and Duponcheel, L.},
	Booktitle = {Advanced Functional Programming},
	Editor = {Launchbury, John and Meijer, Erik and Sheard, Tim},
	Pages = {184-207},
	Publisher = {Springer-Verlag},
	Series = {LNCS-Tutorial},
	Title = {Deterministic, Error-Correcting Combinator Parsers},
	Urlpdf = {http://www.cs.uu.nl/people/doaitse/Papers/1996/DetErrCorrComPars.pdf},
	Volume = {1129},
	Year = {1996}}

Good luck with your further attempts, and let me know which of the a)-d) you prefer, and do not hesitate to inform me about the next problem you may run into ;-}}

Doaitse

PS: if you want to do any further semantic processing then you might consider using the uuagc compiler to build te functions which typically arise in from of the <$>'s: http://www.cs.uu.nl/wiki/bin/view/HUT/AttributeGrammarSystem

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100729/7405f6a6/attachment-0001.html