[Haskell-beginners] Parsing Revisited
Jeffrey Drake
jeffd at techsociety.ca
Sun Nov 9 23:35:38 EST 2008
This helps a lot, and I can go over this in the morning. The only final
question I have is what you would use to apply all this to an arbitrary
string.
For example, in the following partially fictitious code: (based on
something I saw)
main = getContents >>= ...
What would ... be so that you can turn the [Char] into [TeX]? Or can I
specifically use the combinators only?
- Jeff
On Mon, 2008-11-10 at 04:33 +0100, Tillmann Rendel wrote:
> Jeffrey Drake wrote:
> > Given a set of combinators that parse specific parts of a document, how
> > can you string them together so that you can get the whole document
> > parsed?
>
> The general idea is to build parser for complex formats out of parsers
> for simple formats. The structure of the parser often more or less
> follows the structure of the data.
>
> For example, the following data type could be a first approach to
> capture the lexical structure of TeX:
>
> data TeX
> = Letter Char -- for example: A
> | Command String -- for example: \begin
> | Group [TeX] -- for example: {abc\something{...}}
>
> The idea is to parse "a\test {bc}" into the following list of TeX values:
>
> [Letter 'a', Command "test", Group [Letter 'c', Letter 'd']]
>
> Note how the use of lists of TeX values allows to actually represent
> whole documents; and how the Group data constructor allows to capture
> the recursive structure of TeX programs.
>
> Let start by writing the parser for a single TeX value. The datatype
> definition shows that a such a value can be a letter, a command or a
> list of TeX values enclosed in braces. We can capture the fact that we
> have three choices directly in parsers:
>
> tex :: Parser TeX
> tex = texLetter <|> texCommand <|> texGroup
>
> Note how the combinator <|> corresponds to the | syntax in the datatype
> declaration.
>
> Given this parser for TeX values, we can write the parser for a list of
> such values using the many combinator:
>
> texList :: Parser [TeX]
> texList = many tex
>
> Note how the many combinator corresponds to the list type constructor.
>
> Now we have to define the parser for the three data constructors.
> texLetter is easy:
>
> texLetter :: Parser TeX
> texLetter = do l <- letter
> return (Letter l)
>
> Note how the fact that texLetter just wraps letter corresponds to the
> fact that Letter just wraps Char.
>
> Commands are more interesting, because they eat all spaces after the
> name of the control sequence.
>
> texCommand :: Parser TeX
> texCommand = do char '\\'
> name <- many letter
> many (char ' ')
> return (Command name)
>
> By implementing the space eating feature of commands as part of the
> texCommand parser, we can be sure that spaces not following commands
> will not be eaten.
>
> Finally, I would consider the parser for groups the most interesting.
> The inside of a group looks looks just like the whole TeX document
> itself. Fortunately, we have already implemented a parser for whole TeX
> documents, namely texList, which we use for the texGroup parser as follows:
>
> texGroup :: Parser TeX
> texGroup = do char '{'
> content <- texList
> char '}'
>
> Note how the mutual recursion between texList and texGroup corresponds
> to the recursion in the TeX data type.
>
> Of course, the examples in this messages are not meant to be production
> code. Actually, they are not tested at all. But I hope that they help
> you get started with Parsec.
>
> Tillmann
More information about the Beginners
mailing list