[Haskell-beginners] Parsing Custom TeX

Jeffrey Drake jeffd at techsociety.ca
Sat Nov 8 18:22:27 EST 2008


*This message was sent in reply to Jason Dusek but the reply went to him, not the list.

I plan to have Command String, CommandParams [String],
CommandParamsWithArgs [String] [String] or something to that effect. If
each of these are combinators, then I imagine you can use <|> with them.
But realistically, I have no idea how to get this to take a string input
(or something like getContents) and have [UnTeX] come out the other end.

The problem that I have also is that while Command* can come anywhere in
the text, everything goes back to Text unless it is a Paragraph. The
spaces after a Command* up till the next letter have to be ignored, and
superfluous spaces within Text itself also should be. I also can't have
two Paragraphs right beside each other, because that makes little sense.
So from what I can guess - I need a lexer, I think it was called a
lexeme lexer.

The syntax for Command* is like this:

\command
\command[arg]
\command[arg][arg][...]
\command[...]{body}
\command[...]{body1}{body2}{...}

The reason why this is necessary is because you could have something
like \frac{a}{b}. I am trying to be more consistent with my use of this
than LaTeX/TeX is. 

I might need to implement something like a table generator eventually,
but this would be hopefully for the backend. Because I would like to
translate this stuff into HTML and other outputs eventually.

Thank you for your help,
Jeffrey.

On Sat, 2008-11-08 at 01:00 -0800, Jason Dusek wrote:
> Each one of the little combinators seems to work as
>   advertized. Are you having trouble fitting them together?
> 
> --
> _jsn
> 
> 
> module UnTeX where
> 
> import Text.ParserCombinators.Parsec
> import Text.ParserCombinators.Parsec.Prim
> import Text.ParserCombinators.Parsec.Language
> import qualified Text.ParserCombinators.Parsec.Token as T
> 
> data UnTeX
>   = Command String [String] String
>   | Text String
>   | Paragraph
>  deriving Show
> 
> 
>  -- I don't remember TeX very well, so I'm not sure this is right.
> command                     ::  Parser UnTeX
> command                      =  do
>   char '\\'
>   cmd                       <-  ident
>   p                         <-  orNot params []
>   b                         <-  orNot body ""
>   return $ Command cmd p b
>  where
>   params                     =  many1 $ between (char '[') (char ']') ident
>   body                       =  do
>     char '{'
>     text                    <-  many1 $ noneOf "}"
>     char '}'
>     return text
>   ident                      =  many1 $ letter <|> digit
>   orNot p n                  =  choice [try p , return n]
> 
> 
> paragraph                   ::  Parser UnTeX
> paragraph                    =  do
>   newline
>   newline
>   return $ Paragraph
> 
> text                        ::  Parser UnTeX
> text                         =  do
>   txt                       <-  many1 (alphaNum <|> space)
>   return $ Text txt



More information about the Beginners mailing list