[Haskell-cafe] Parsec operator with letter problem
Reto Kramer
kramer at acm.org
Thu Mar 30 18:15:20 EST 2006
I ran into the following problem with parsec's handling of operators.
I hope someone on the list can show me a trick that resolve my
current issue. At the end of this message is the full code for the
reproducer.
The language I'm parsing has infix operators of two forms. Some are
special characters (e.g. a dot) and some are LaTeX like (e.g. \in).
The letters that appear in the e.g. \in lead to problems with
expression parsing.
The following is what I expect:
$ cat input1
a.b
$ ./test input1
InfixExpr OP_Dot (Ident "a") (Ident "b")
Good!
Then I try (change the b to an n)
$ cat input2
a.n
$ ./test input2
Ident "a"
OUTSCH! Changing the name of an identifier changed the expression!
Then I try (add a space right after the dot)
$ cat input3
a. n
$ ./test input2
InfixExpr OP_Dot (Ident "a") (Ident "n")
Good!
What is going on?
The 'n' is part of the "\in" operator (see reservedOpNames in test.hs
program at the end of message) and confuses the parsec expression
parser's ability to determine the end of expressions/identifiers.
Adding a space right after the '.' operator resolves the issue, but
that's not a suitable option for my users.
Unfortunately I cannot remove the "\in" operator from the
reservedOpNames list since otherwise "\in" is not recognized as an
infix operator itself anymore.
QUESTION:
how can I get parsec's expression parser to work with infix operators
of the form "\in"?
Thanks,
- Reto
-- complete reproducer (test.hs)
-- compile with: ghc -o test -package parsec test.hs
module Main where
import List (nub)
import System (getArgs)
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Language( emptyDef )
import qualified Text.ParserCombinators.Parsec.Token as P
main :: IO ()
main = do{ args <- getArgs
; let fname = args !! 0
; input <- readFile fname
; case parse spec fname input of
Left err -> do{ putStr "parse error at "
; print err
}
Right x -> print x
}
data Op = OP_Dot | OP_In deriving (Show, Eq, Ord)
data Expr = InfixExpr Op Expr Expr
| Ident String
| Number Integer deriving (Show, Eq, Ord)
spec :: Parser Expr
spec = do { whiteSpace
; e <- expression
; return e
}
expression :: Parser Expr
expression = buildExpressionParser table basicExpr <?> "expression"
op_infix :: Op -> Expr -> Expr -> Expr
op_infix op a b = InfixExpr op a b
table :: OperatorTable Char () Expr
table = [ [binary "." (op_infix OP_Dot) AssocLeft]
, [binary "\\in" (op_infix OP_In) AssocLeft] ]
binary name fun assoc = Infix (do{ reservedOp name; return fun }) assoc
basicExpr :: Parser Expr
basicExpr = choice [
do{ i <- identifier
; return $ Ident i
}
]
lexer = P.makeTokenParser testdef
testdef = emptyDef {
P.identStart = letter <|> char '_'
, P.identLetter = alphaNum <|> char '_'
, P.opStart = oneOf $ nub $
map (\s -> head s) $ P.reservedOpNames testdef
, P.opLetter = oneOf (concat (P.reservedOpNames testdef))
, P.reservedOpNames = [ ".", "\\in" ] }
dot = P.dot lexer
parens = P.parens lexer
braces = P.braces lexer
squares = P.squares lexer
semiSep = P.semiSep lexer
semiSep1 = P.semiSep1 lexer
commaSep = P.commaSep lexer
commaSep1 = P.commaSep1 lexer
brackets = P.brackets lexer
whiteSpace = P.whiteSpace lexer
symbol = P.symbol lexer
identifier = P.identifier lexer
reserved = P.reserved lexer
reservedOp = P.reservedOp lexer
integer = P.integer lexer
natural = P.natural lexer
charLiteral = P.charLiteral lexer
stringLiteral = P.stringLiteral lexer
More information about the Haskell-Cafe
mailing list