[Haskell-cafe] Overlaping Parsec rules
Christian Maeder
Christian.Maeder at dfki.de
Tue Mar 8 12:55:45 CET 2011
Am 08.03.2011 12:30, schrieb Hauschild, Klaus (EXT):
> Hi Christian,
>
> my version of parseList works currently strange.
> The input "{2\n2}" will be parsed fine, but something like "{ }" or so fails with 'expecting space or "}"'.
You must skip (possible) spaces after "{", too. (Actually after every
lexeme.)
C.
>
> The redefinition of space is not necessary, it was copied from another tutorial code. How I write a version of parseGml that get gml token separated by any white space (space, tab, newline)?
>
> Thanks
>
> -----Ursprüngliche Nachricht-----
> Von: Christian Maeder [mailto:Christian.Maeder at dfki.de]
> Gesendet: Montag, 7. März 2011 14:18
> An: Hauschild, Klaus (EXT)
> Cc: haskell-cafe at haskell.org
> Betreff: Re: Overlaping Parsec rules
>
> Am 07.03.2011 13:48, schrieb Hauschild, Klaus (EXT):
>> Thanks Christian,
>>
>> I adapted the "keyword" parser and now "n" <-> "negi" does not occur.
>>
>> But there are still other problems to solve. If I activate the parseFunction the parser will answer to fact.gml unexpected end of input, expecting space or "}".
>
> Your redefinition of spaces (= skipMany1 space) is problematic and:
>
> parseList = sepBy parseGml' spaces
>
> Allow for the original (optional) spaces after parseGml':
>
> parseGml'' = liftM2 const parseGml' spaces
>
> parseList = many parseGml''
>
> C.
>
> P.S. why do you call? many (noneOf "")
>
> manyTill anyChar newline
>
> or just: many (noneOf "\n")
>
> (a trailing newline will be skipped by spaces)
>
>>
>> Any ideas?
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Christian Maeder [mailto:Christian.Maeder at dfki.de]
>> Gesendet: Montag, 7. März 2011 12:23
>> An: Hauschild, Klaus (EXT)
>> Cc: haskell-cafe at haskell.org
>> Betreff: Re: Overlaping Parsec rules
>>
>> You should parse keywords using:
>>
>> keyword s = try (string s) >> notFollowedBy (letter <|> digit)
>>
>> C.
>>
>> Am 07.03.2011 11:34, schrieb Hauschild, Klaus (EXT):
>>> Hi,
>>>
>>> to solve this ICFP task _http://www.cs.cornell.edu/icfp/task.htm_ I'm
>>> currnetly working on the parser. With the hint from Thu (reading Phillip
>>> Wadlers monadic parser paper) and consulting
>>> _http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/Parsing_
>>> I produce a first working version of the parser.
>>> After this great moment I completed the token definition and near all
>>> parsing rules. For the complete code have a look at
>>> _http://code.google.com/p/hgmltracer/source/browse/#svn%2Ftrunk%2FhGmlTracer_
>>>
>>> data GmlToken =
>>>
>>> -- structures
>>>
>>> FunctionToken TokenSequence |
>>>
>>> ArrayToken TokenSequence |
>>>
>>> -- control operators
>>>
>>> ApplyToken |
>>>
>>> IfToken |
>>>
>>> -- number operators
>>>
>>> AddiToken |
>>>
>>> AddfToken |
>>>
>>> ACosToken |
>>>
>>> ASinToken |
>>>
>>> ClampfToken |
>>>
>>> CosToken |
>>>
>>> DiviToken |
>>>
>>> DivfToken |
>>>
>>> EqiToken |
>>>
>>> EqfToken |
>>>
>>> FloorToken |
>>>
>>> FracToken |
>>>
>>> LessiToken |
>>>
>>> LessfToken |
>>>
>>> ModiToken |
>>>
>>> MuliToken |
>>>
>>> MulfToken |
>>>
>>> NegiToken |
>>>
>>> NegfToken |
>>>
>>> ToRealToken |
>>>
>>> SinToken |
>>>
>>> SqrtToken |
>>>
>>> SubiToken |
>>>
>>> SubfToken |
>>>
>>> -- points
>>>
>>> GetXToken |
>>>
>>> GetYToken |
>>>
>>> GetZToken |
>>>
>>> PointToken |
>>>
>>> -- arrays
>>>
>>> GetToken |
>>>
>>> LengthToken |
>>>
>>> -- environment
>>>
>>> IdentifierToken String |
>>>
>>> BinderToken String |
>>>
>>> -- types
>>>
>>> BoolToken Bool |
>>>
>>> IntToken Int |
>>>
>>> RealToken Double |
>>>
>>> StringToken String deriving Show
>>>
>>> And
>>>
>>> parseGml :: String -> [GmlToken]
>>>
>>> parseGml input = case parse parseList "gml" input of
>>>
>>> Left err -> error ("Parse error: " ++ (show err))
>>>
>>> Right gml -> gml
>>>
>>>
>>>
>>> parseList = sepBy parseGml' spaces
>>>
>>>
>>>
>>> parseGml' =
>>>
>>> -- structures
>>>
>>> -- parseFunction
>>>
>>> -- <|> parseArray
>>>
>>> -- control operators
>>>
>>> parseControlOperators
>>>
>>> -- number operators
>>>
>>> <|> parseNumberOperators
>>>
>>> -- points
>>>
>>> <|> parsePointOperators
>>>
>>> -- arrays
>>>
>>> <|> parseArrayOperators
>>>
>>> -- types
>>>
>>> <|> parseBool
>>>
>>> <|> parseInteger
>>>
>>> <|> parseString
>>>
>>> -- environment
>>>
>>> <|> parseIdentifier
>>>
>>> <|> parseBinder
>>>
>>>
>>>
>>> parseArray = parseSequence '[' ']'
>>>
>>> parseFunction = parseSequence '{' '}'
>>>
>>> parseSequence start end = do char start
>>>
>>> array <- parseList
>>>
>>> char end
>>>
>>> return $ ArrayToken array
>>>
>>>
>>>
>>> parseControlOperators = parseApply <|> parseIf
>>>
>>> parseApply = do string "apply"
>>>
>>> return $ ApplyToken
>>>
>>> parseIf = do string "if"
>>>
>>> return $ IfToken
>>>
>>>
>>>
>>> parseNumberOperators = do string "addi"
>>>
>>> return $ AddiToken
>>>
>>> <|> do string "addf"
>>>
>>> return $ AddfToken
>>>
>>> <|> do string "acos"
>>>
>>> return $ ACosToken
>>>
>>> <|> do string "asind"
>>>
>>> return $ ASinToken
>>>
>>> <|> do string "clampf"
>>>
>>> return $ ClampfToken
>>>
>>> <|> do string "cos"
>>>
>>> return $ CosToken
>>>
>>> <|> do string "divi"
>>>
>>> return $ DiviToken
>>>
>>> <|> do string "divf"
>>>
>>> return $ DivfToken
>>>
>>> <|> do string "eqi"
>>>
>>> return $ EqiToken
>>>
>>> <|> do string "eqf"
>>>
>>> return $ EqfToken
>>>
>>> <|> do string "floor"
>>>
>>> return $ FloorToken
>>>
>>> <|> do string "frac"
>>>
>>> return $ FracToken
>>>
>>> <|> do string "lessi"
>>>
>>> return $ LessiToken
>>>
>>> <|> do string "lessf"
>>>
>>> return $ LessfToken
>>>
>>> <|> do string "modi"
>>>
>>> return $ ModiToken
>>>
>>> <|> do string "muli"
>>>
>>> return $ MuliToken
>>>
>>> <|> do string "mulf"
>>>
>>> return $ MulfToken
>>>
>>> <|> do string "negi"
>>>
>>> return $ NegiToken
>>>
>>> <|> do string "negf"
>>>
>>> return $ NegfToken
>>>
>>> <|> do string "real"
>>>
>>> return $ ToRealToken
>>>
>>> <|> do string "sin"
>>>
>>> return $ SinToken
>>>
>>> <|> do string "Sqrt"
>>>
>>> return $ SqrtToken
>>>
>>> <|> do string "subi"
>>>
>>> return $ SubiToken
>>>
>>> <|> do string "subf"
>>>
>>> return $ SubfToken
>>>
>>>
>>>
>>> parsePointOperators = do string "getx"
>>>
>>> return $ GetXToken
>>>
>>> <|> do string "gety"
>>>
>>> return $ GetYToken
>>>
>>> <|> do string "getz"
>>>
>>> return $ GetZToken
>>>
>>> <|> do string "point"
>>>
>>> return $ PointToken
>>>
>>>
>>>
>>> parseArrayOperators = do string "get"
>>>
>>> return $ GetToken
>>>
>>> <|> do string "length"
>>>
>>> return $ LengthToken
>>>
>>>
>>>
>>> parseInteger = liftM (IntToken . read) $ many1 digit
>>>
>>> parseBool = do string "true"
>>>
>>> return $ BoolToken True
>>>
>>> <|> do string "false"
>>>
>>> return $ BoolToken False
>>>
>>> parseString = do char '"'
>>>
>>> string <- many (noneOf "\"")
>>>
>>> char '"'
>>>
>>> return $ StringToken string
>>>
>>>
>>>
>>> parseIdentifier = do identifier <- many (noneOf " ")
>>>
>>> return $ IdentifierToken identifier
>>>
>>> parseBinder = do char '/'
>>>
>>> binder <- many (noneOf " ")
>>>
>>> return $ BinderToken binder
>>>
>>>
>>>
>>> parseComment = do char '%'
>>>
>>> many (noneOf "")
>>>
>>> newline
>>>
>>> return $ ()
>>>
>>>
>>>
>>> spaces = skipMany1 space
>>>
>>>
>>> After gluing all this together in my mind all worked well. But it
>>> doesn't. The test file for parsing looks like:
>>>
>>> { /self /n
>>> n 2 lessi
>>> { 1 }
>>> { n 1 subi self self apply n muli }
>>> if
>>> } /fact
>>>
>>> 12 fact fact apply
>>>
>>>
>>> * I think there is a problem with overlaping rules. There is a
>>> parser rule consuming "negi" and resulting in the NegiToken. A
>>> single "n" is a valid identifier. For the example file my parser
>>> says: unexpected " ", expecting "negi"
>>> * I think the same problem is present for "parseInteger" and
>>> "parseReal" (currently no in code but looks like "parseReal = do a
>>> <- many1 digit \n char '.' \n b <- many1 digit \n return $
>>> RealToken (read (a ++ "." ++ b)"
>>> * Something with "parseFunction" is going really wrong.
>>> * All parsig rules are designed with the condition that each
>>> "construct" is separated by whitespaces. For this before parsing
>>> the input will be preprocessed: insert spaces, removing
>>> whitespaces and so on. Now in the parsed result appears
>>> (IdentifierToken ""). I think my version is not the best way to
>>> parse a identifer:
>>>
>>> parseIdentifier = *do* identifier <- many (noneOf " ")
>>> return $ *IdentifierToken* identifier
>>>
>>> Please help me.
>>>
>>> Klaus
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe at haskell.org
>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list