[Haskell-cafe] Designing a Parser

Sun Feb 17 02:33:04 EST 2008

Actually, I haven't sent this question to the list before. So you're 
in no danger of repeating yourself.
Thanks for your kind reply anyway
Paul
At 07:03 17/02/2008, you wrote:
>On Feb 17, 2008 6:20 AM, PR Stanley <prstanley at ntlworld.com> wrote:
> >   I can't think of an elegant pattern for the last function. I've
> > already tried set comprehension. However, something tells me that the
> > answer may lie in a complex recursive pattern. I'm not afraid of
> > complex solutions but I naturally see them as an easy way out. It's
> > those clear simple patterns that separate men from mice. :-) I'd be
> > grateful for any advice on this and indeed my approach as a whole. If
> > you think I'm on the wrong path from the start feel free to let me
> > know. I'm not asking for the answer. I'd like to work that out by
> > myself although some guidance would be most appreciated.
>
>Yes, I definitely think you're on the wrong path from the start.  In
>high school when I was learning C++ I wrote an arithmetic expression
>parser in something like this fashion.  I scanned the input for the
>first opening bracket, then I walked forward to the matching closing
>bracket and extracted the subexpression and evaluated it.
>
>The approach turned out to be both complicated and inefficient.  I
>think the biggest problem was that it didn't scale well with
>implementation complexity; eg. to add support for prefix functions
>like sin() was almost impossible.  Think about how you would go about
>doing such things using your approach.
>
>I think I've seen you ask questions relating to this on the list
>before, and at the risk of repeating others, I suggest the ReadS in
>the Haskell Prelude.  You get the benefit of it being a clean,
>top-down solution as well as the benefit of having a lot of primitive
>Haskell types already implemented for you (such as Integers).
>
>I'll describe the approach again, describing the combinators, and then
>afterward showing how you might use them to accomplish this problem.
>I'll leave the details up to you, since you can learn a lot by
>implementing combinator libraries like this.
>
>First, you build a bunch of functions which are "parsers".  A "parser"
>is just a function of type String -> [(a,String)] for some type 'a';
>that is, it takes a string and returns a list of "parses", where a
>parse is the value parsed paired with the remainder of the string.  So
>if you have a function:
>
>     parseInt :: String -> [(Int,String)]
>
>Then you can expect this result:
>
>     parseInt "123 hello world" = [(123, " hello world")]
>
>Or maybe even:
>
>     parseInt "123 hello world" = [(123, " hello world"), (12, "3 hello
>world"), (1, "23 hello world")]
>
>But that latter behavior is not recommended in this case (i.e. it is
>advisable force int parsing to be greedy).
>
>Then to combine parsers you can use some combination operations:
>
>     sequenceParsers :: (String -> [(a,String)]) -> (a -> String ->
>[(b, String)]) -> String -> [(b,String)]
>
>That one may be easier to see if you use the built-in type synonym
>ReadS a = String -> [(a, String)]:
>
>     sequenceParsers :: ReadS a -> (a -> ReadS b) -> ReadsB
>
>The second argument here is a function, because we have already parsed
>the first argument and know its value, so the second argument ought to
>be able to use it.  We can also write:
>
>     alternateParsers :: ReadS a -> ReadS a -> ReadS a
>
>Which gives all valid parses that the first one recognizes
>concatenated with all valid parsers that the second one recognizes.
>
>Implementation of these combinators is left to you.  Since the types
>of these functions are quite general, you can use a type-directed
>approach (i.e. if your implementation uses all available data and it
>typechecks, it's probably correct).
>
>Now that you have these, how do you use them to actually parse
>something?  Let's parse simple logical expressions.  First we need a
>data structure to parse into:
>
>     data Exp
>         = Variable String
>         | AndExp Exp Exp
>         | OrExp Exp Exp
>
>Let's do it with a top down coding strategy:  write a function to
>parse an expression using whatever helper functions we need but
>haven't written yet :-)
>
>     parseExp :: ReadS Exp
>     parseExp = parseAnyOf [ parseVariable, parseAndExp, parseOrExp ]
>
>Where we haven't written parseAnyOf yet.  Write that inductively on the list:
>
>     parseAnyOf :: [ReadS a] -> ReadS a
>     parseAnyOf []     = \input -> []
>     parseAnyOf (p:ps) = alternateParsers p (parseAnyOf ps)
>
>And jump in to the next thing we haven't written, parseVariable:
>
>     parseVariable :: ReadS Exp
>     parseVariable = mapParser Variable parseString
>
>mapParser doesn't actually parse anything, it just parses whatever its
>argument does and applies a function to the result:
>
>     mapParser :: (a -> b) -> ReadS a -> ReadS b
>     mapParser :: (a -> b) -> (String -> [(a,String)]) -> String -> 
> [(b,String)]
>
>I rewrote the type signature to help guide your implementation.
>
>That should get you started, and show you how the approach usually
>goes.  You seem to already get the idea of writing many small
>functions and composing them together.  This is the same idea, except
>the functions are abstracting the problem more.
>
>Luke