[Haskell-cafe] Parsec operator with letter problem
Daniel Fischer
daniel.is.fischer at web.de
Fri Mar 31 08:24:51 EST 2006
Hi,
probably somebody else has already come up with something better, but still...
I surmise that you have two kinds of infix-operators,
1. dot-like operators, made up entirely of symbols (^!$%&/\,.:;#+-~* ...)
2. LaTeX-command-like operators, starting with a backslash and then followed
by a nonempty sequence of letters (or possibly alphanumeric characters).
Then the following helps:
import Data.Char (isAlpha)
lexer = lexer0{P.reservedOp = rOp}
where
lexer0 = P.makeTokenParser testdef
resOp0 = P.reservedOp lexer0
resOp1 name =
case name of
('\\':cs@(_:_))
| all isAlpha cs -> do string name
notFollowedBy letter <?>
("end of " ++ show name)
_ -> fail (show name ++ " no good reservedOp")
rOp name = lexeme $ try $ resOp0 name <|> resOp1 name
lexeme p = do { x <- p; P.whiteSpace lexer0; return x }
testdef = emptyDef
{ P.identStart = letter <|> char '_'
, P.identLetter = alphaNum <|> char '_'
, P.opStart = oneOf $ nub $
map (\s -> head s) $ P.reservedOpNames testdef
-- , P.opLetter = oneOf (concat (P.reservedOpNames testdef))
, P.opLetter = oneOf symbs
, P.reservedOpNames = [ ".", "\\in" ] }
where
symbs = filter (not . isAlpha) . concat $ P.reservedOpNames testdef
---------------------------------------------------------------------
dafis at linux:~/Documents/haskell/Reto> cat input
a.n
dafis at linux:~/Documents/haskell/Reto> reto input
InfixExpr OP_Dot (Ident "a") (Ident "n")
If you have more complicated infix operators (e.g. \foo#bar:, :ouch:),
it won't be so easy, anyway, you have to change the definition of reservedOp.
Cheers,
Daniel
Am Freitag, 31. März 2006 01:15 schrieb Reto Kramer:
> I ran into the following problem with parsec's handling of operators.
> I hope someone on the list can show me a trick that resolve my
> current issue. At the end of this message is the full code for the
> reproducer.
>
> The language I'm parsing has infix operators of two forms. Some are
> special characters (e.g. a dot) and some are LaTeX like (e.g. \in).
> The letters that appear in the e.g. \in lead to problems with
> expression parsing.
>
> The following is what I expect:
> $ cat input1
> a.b
> $ ./test input1
> InfixExpr OP_Dot (Ident "a") (Ident "b")
> Good!
>
> Then I try (change the b to an n)
> $ cat input2
> a.n
> $ ./test input2
> Ident "a"
> OUTSCH! Changing the name of an identifier changed the expression!
>
> Then I try (add a space right after the dot)
> $ cat input3
> a. n
> $ ./test input2
> InfixExpr OP_Dot (Ident "a") (Ident "n")
> Good!
>
> What is going on?
>
> The 'n' is part of the "\in" operator (see reservedOpNames in test.hs
> program at the end of message) and confuses the parsec expression
> parser's ability to determine the end of expressions/identifiers.
> Adding a space right after the '.' operator resolves the issue, but
> that's not a suitable option for my users.
>
> Unfortunately I cannot remove the "\in" operator from the
> reservedOpNames list since otherwise "\in" is not recognized as an
> infix operator itself anymore.
>
> QUESTION:
> how can I get parsec's expression parser to work with infix operators
> of the form "\in"?
>
> Thanks,
> - Reto
>
>
> -- complete reproducer (test.hs)
> -- compile with: ghc -o test -package parsec test.hs
>
> module Main where
>
> import List (nub)
> import System (getArgs)
> import Text.ParserCombinators.Parsec
> import Text.ParserCombinators.Parsec.Expr
> import Text.ParserCombinators.Parsec.Language( emptyDef )
> import qualified Text.ParserCombinators.Parsec.Token as P
>
> main :: IO ()
> main = do{ args <- getArgs
> ; let fname = args !! 0
> ; input <- readFile fname
> ; case parse spec fname input of
> Left err -> do{ putStr "parse error at "
> ; print err
> }
> Right x -> print x
> }
>
> data Op = OP_Dot | OP_In deriving (Show, Eq, Ord)
> data Expr = InfixExpr Op Expr Expr
>
> | Ident String
> | Number Integer deriving (Show, Eq, Ord)
>
> spec :: Parser Expr
> spec = do { whiteSpace
> ; e <- expression
> ; return e
> }
>
> expression :: Parser Expr
> expression = buildExpressionParser table basicExpr <?> "expression"
>
> op_infix :: Op -> Expr -> Expr -> Expr
> op_infix op a b = InfixExpr op a b
>
> table :: OperatorTable Char () Expr
> table = [ [binary "." (op_infix OP_Dot) AssocLeft]
> , [binary "\\in" (op_infix OP_In) AssocLeft] ]
> binary name fun assoc = Infix (do{ reservedOp name; return fun }) assoc
>
> basicExpr :: Parser Expr
> basicExpr = choice [
> do{ i <- identifier
> ; return $ Ident i
> }
> ]
>
> lexer = P.makeTokenParser testdef
>
> testdef = emptyDef {
> P.identStart = letter <|> char '_'
> , P.identLetter = alphaNum <|> char '_'
> , P.opStart = oneOf $ nub $
> map (\s -> head s) $ P.reservedOpNames testdef
> , P.opLetter = oneOf (concat (P.reservedOpNames testdef))
> , P.reservedOpNames = [ ".", "\\in" ] }
>
> dot = P.dot lexer
> parens = P.parens lexer
> braces = P.braces lexer
> squares = P.squares lexer
> semiSep = P.semiSep lexer
> semiSep1 = P.semiSep1 lexer
> commaSep = P.commaSep lexer
> commaSep1 = P.commaSep1 lexer
> brackets = P.brackets lexer
> whiteSpace = P.whiteSpace lexer
> symbol = P.symbol lexer
> identifier = P.identifier lexer
> reserved = P.reserved lexer
> reservedOp = P.reservedOp lexer
> integer = P.integer lexer
> natural = P.natural lexer
> charLiteral = P.charLiteral lexer
> stringLiteral = P.stringLiteral lexer
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
--
"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
-- Blair P. Houghton
More information about the Haskell-Cafe
mailing list