[Haskell-cafe] Designing a Parser

Sun Feb 17 02:03:01 EST 2008

On Feb 17, 2008 6:20 AM, PR Stanley <prstanley at ntlworld.com> wrote:
>   I can't think of an elegant pattern for the last function. I've
> already tried set comprehension. However, something tells me that the
> answer may lie in a complex recursive pattern. I'm not afraid of
> complex solutions but I naturally see them as an easy way out. It's
> those clear simple patterns that separate men from mice. :-) I'd be
> grateful for any advice on this and indeed my approach as a whole. If
> you think I'm on the wrong path from the start feel free to let me
> know. I'm not asking for the answer. I'd like to work that out by
> myself although some guidance would be most appreciated.

Yes, I definitely think you're on the wrong path from the start.  In
high school when I was learning C++ I wrote an arithmetic expression
parser in something like this fashion.  I scanned the input for the
first opening bracket, then I walked forward to the matching closing
bracket and extracted the subexpression and evaluated it.

The approach turned out to be both complicated and inefficient.  I
think the biggest problem was that it didn't scale well with
implementation complexity; eg. to add support for prefix functions
like sin() was almost impossible.  Think about how you would go about
doing such things using your approach.

I think I've seen you ask questions relating to this on the list
before, and at the risk of repeating others, I suggest the ReadS in
the Haskell Prelude.  You get the benefit of it being a clean,
top-down solution as well as the benefit of having a lot of primitive
Haskell types already implemented for you (such as Integers).

I'll describe the approach again, describing the combinators, and then
afterward showing how you might use them to accomplish this problem.
I'll leave the details up to you, since you can learn a lot by
implementing combinator libraries like this.

First, you build a bunch of functions which are "parsers".  A "parser"
is just a function of type String -> [(a,String)] for some type 'a';
that is, it takes a string and returns a list of "parses", where a
parse is the value parsed paired with the remainder of the string.  So
if you have a function:

    parseInt :: String -> [(Int,String)]

Then you can expect this result:

    parseInt "123 hello world" = [(123, " hello world")]

Or maybe even:

    parseInt "123 hello world" = [(123, " hello world"), (12, "3 hello
world"), (1, "23 hello world")]

But that latter behavior is not recommended in this case (i.e. it is
advisable force int parsing to be greedy).

Then to combine parsers you can use some combination operations:

    sequenceParsers :: (String -> [(a,String)]) -> (a -> String ->
[(b, String)]) -> String -> [(b,String)]

That one may be easier to see if you use the built-in type synonym
ReadS a = String -> [(a, String)]:

    sequenceParsers :: ReadS a -> (a -> ReadS b) -> ReadsB

The second argument here is a function, because we have already parsed
the first argument and know its value, so the second argument ought to
be able to use it.  We can also write:

    alternateParsers :: ReadS a -> ReadS a -> ReadS a

Which gives all valid parses that the first one recognizes
concatenated with all valid parsers that the second one recognizes.

Implementation of these combinators is left to you.  Since the types
of these functions are quite general, you can use a type-directed
approach (i.e. if your implementation uses all available data and it
typechecks, it's probably correct).

Now that you have these, how do you use them to actually parse
something?  Let's parse simple logical expressions.  First we need a
data structure to parse into:

    data Exp
        = Variable String
        | AndExp Exp Exp
        | OrExp Exp Exp

Let's do it with a top down coding strategy:  write a function to
parse an expression using whatever helper functions we need but
haven't written yet :-)

    parseExp :: ReadS Exp
    parseExp = parseAnyOf [ parseVariable, parseAndExp, parseOrExp ]

Where we haven't written parseAnyOf yet.  Write that inductively on the list:

    parseAnyOf :: [ReadS a] -> ReadS a
    parseAnyOf []     = \input -> []
    parseAnyOf (p:ps) = alternateParsers p (parseAnyOf ps)

And jump in to the next thing we haven't written, parseVariable:

    parseVariable :: ReadS Exp
    parseVariable = mapParser Variable parseString

mapParser doesn't actually parse anything, it just parses whatever its
argument does and applies a function to the result:

    mapParser :: (a -> b) -> ReadS a -> ReadS b
    mapParser :: (a -> b) -> (String -> [(a,String)]) -> String -> [(b,String)]

I rewrote the type signature to help guide your implementation.

That should get you started, and show you how the approach usually
goes.  You seem to already get the idea of writing many small
functions and composing them together.  This is the same idea, except
the functions are abstracting the problem more.

Luke