Lazy Parsing

Koen Claessen koen@cs.chalmers.se
Thu, 28 Feb 2002 18:29:04 +0100 (MET)


Brandon Michael Moore wondered:

 | I'm wondering if there are any libraries out there
 | for creating parsers that lazily build up their
 | result.

Joe English answered:

 | type Parser sym result = [sym] -> Maybe (result, [sym])

However, a parser for a particular type T might actually
just return T directly, without any Maybe in the way, if T
has a failure constructor built-in. Something like:

  data T
    = Plus T T
    | ...
    | Fail Error

Now, a parser for T can already start building its result
without having to wait for the whole thing to succeed.

This gets a bit ugly, especially when dealing with many
types that are being parsed, or with the extra failure
constructor always being part of the datatype.

One way of adding such a constructor to any type would be:

  data Result a
    = Val a
    | forall b c . Apply2 (b -> c -> Result a) (Result b) (Result c)
    | Fail Error

This gives you a way of creating an 'a' not only lazily, but
one can decide oneself what to do with failure and when to
do it.

Example: When parsing the following faulty expression

  "(1+2)+blurp"

And expecting an Int as a result, the parser would build the
following expression:

  Apply2 (\x y -> Val (x+y))
    (Apply2 (\x y -> Val (x+y)) (Val 1) (Val 2))
      (Fail "unrecognized `blurp'")

None of this is type checked and surely contains bugs, but I
hope you get the idea.

The hope is that a value of type `Result a' gives you more
information and better space behavior than just a value of
type `Maybe a'.

/Koen.

--
Koen Claessen
http://www.cs.chalmers.se/~koen
Chalmers University, Gothenburg, Sweden.