Jeffrey Drake jeffd at techsociety.ca
Sun Nov 9 23:35:38 EST 2008

This helps a lot, and I can go over this in the morning. The only final
question I have is what you would use to apply all this to an arbitrary
string.

For example, in the following partially fictitious code: (based on
something I saw)

main = getContents >>= ...

What would ... be so that you can turn the [Char] into [TeX]? Or can I
specifically use the combinators only?

- Jeff

On Mon, 2008-11-10 at 04:33 +0100, Tillmann Rendel wrote:
> Jeffrey Drake wrote:
> > Given a set of combinators that parse specific parts of a document, how
> > can you string them together so that you can get the whole document
> > parsed?
>
> The general idea is to build parser for complex formats out of parsers
> for simple formats. The structure of the parser often more or less
> follows the structure of the data.
>
> For example, the following data type could be a first approach to
> capture the lexical structure of TeX:
>
>    data TeX
>      = Letter Char             -- for example: A
>      | Command String          -- for example: \begin
>      | Group [TeX]             -- for example: {abc\something{...}}
>
> The idea is to parse "a\test {bc}" into the following list of TeX values:
>
>    [Letter 'a', Command "test", Group [Letter 'c', Letter 'd']]
>
> Note how the use of lists of TeX values allows to actually represent
> whole documents; and how the Group data constructor allows to capture
> the recursive structure of TeX programs.
>
> Let start by writing the parser for a single TeX value. The datatype
> definition shows that a such a value can be a letter, a command or a
> list of TeX values enclosed in braces. We can capture the fact that we
> have three choices directly in parsers:
>
>    tex :: Parser TeX
>    tex = texLetter <|> texCommand <|> texGroup
>
> Note how the combinator <|> corresponds to the | syntax in the datatype
> declaration.
>
> Given this parser for TeX values, we can write the parser for a list of
> such values using the many combinator:
>
>    texList :: Parser [TeX]
>    texList = many tex
>
> Note how the many combinator corresponds to the list type constructor.
>
> Now we have to define the parser for the three data constructors.
> texLetter is easy:
>
>    texLetter :: Parser TeX
>    texLetter = do l <- letter
>                   return (Letter l)
>
> Note how the fact that texLetter just wraps letter corresponds to the
> fact that Letter just wraps Char.
>
> Commands are more interesting, because they eat all spaces after the
> name of the control sequence.
>
>    texCommand :: Parser TeX
>    texCommand = do char '\\'
>                    name <- many letter
>                    many (char ' ')
>                    return (Command name)
>
> By implementing the space eating feature of commands as part of the
> texCommand parser, we can be sure that spaces not following commands
> will not be eaten.
>
> Finally, I would consider the parser for groups the most interesting.
> The inside of a group looks looks just like the whole TeX document
> itself. Fortunately, we have already implemented a parser for whole TeX
> documents, namely texList, which we use for the texGroup parser as follows:
>
>    texGroup :: Parser TeX
>    texGroup = do char '{'
>                  content <- texList
>                  char '}'
>
> Note how the mutual recursion between texList and texGroup corresponds
> to the recursion in the TeX data type.
>
> Of course, the examples in this messages are not meant to be production
> code. Actually, they are not tested at all. But I hope that they help
> you get started with Parsec.
>
>    Tillmann