[Haskell-cafe] Overlaping Parsec rules

Mon Mar 7 14:17:40 CET 2011

Am 07.03.2011 13:48, schrieb Hauschild, Klaus (EXT):
> Thanks Christian,
> 
> I adapted the "keyword" parser and now "n" <-> "negi" does not occur.
> 
> But there are still other problems to solve. If I activate the parseFunction the parser will answer to fact.gml unexpected end of input, expecting space or "}".

Your redefinition of spaces (= skipMany1 space) is problematic and:

 parseList = sepBy parseGml' spaces

Allow for the original (optional) spaces after parseGml':

  parseGml'' = liftM2 const parseGml' spaces

  parseList = many parseGml''

C.

P.S. why do you call? many (noneOf "")

   manyTill anyChar newline

   or just: many (noneOf "\n")

   (a trailing newline will be skipped by spaces)

> 
> Any ideas?
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Christian Maeder [mailto:Christian.Maeder at dfki.de] 
> Gesendet: Montag, 7. März 2011 12:23
> An: Hauschild, Klaus (EXT)
> Cc: haskell-cafe at haskell.org
> Betreff: Re: Overlaping Parsec rules
> 
> You should parse keywords using:
> 
> keyword s = try (string s) >> notFollowedBy (letter <|> digit)
> 
> C.
> 
> Am 07.03.2011 11:34, schrieb Hauschild, Klaus (EXT):
>> Hi,
>>  
>> to solve this ICFP task _http://www.cs.cornell.edu/icfp/task.htm_ I'm
>> currnetly working on the parser. With the hint from Thu (reading Phillip
>> Wadlers monadic parser paper) and consulting
>> _http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/Parsing_
>> I produce a first working version of the parser.
>> After this great moment I completed the token definition and near all
>> parsing rules. For the complete code have a look at
>> _http://code.google.com/p/hgmltracer/source/browse/#svn%2Ftrunk%2FhGmlTracer_
>>  
>> data GmlToken =
>>        
>>         -- structures
>>        
>>         FunctionToken TokenSequence     |
>>        
>>         ArrayToken TokenSequence        |
>>        
>>         -- control operators
>>        
>>         ApplyToken                      |
>>        
>>         IfToken                         |
>>        
>>         -- number operators
>>        
>>         AddiToken                       |
>>        
>>         AddfToken                       |
>>        
>>         ACosToken                       |
>>        
>>         ASinToken                       |
>>        
>>         ClampfToken                     |
>>        
>>         CosToken                        |
>>        
>>         DiviToken                       |
>>        
>>         DivfToken                       |
>>        
>>         EqiToken                        |
>>        
>>         EqfToken                        |
>>        
>>         FloorToken                      |
>>        
>>         FracToken                       |
>>        
>>         LessiToken                      |
>>        
>>         LessfToken                      |
>>        
>>         ModiToken                       |
>>        
>>         MuliToken                       |
>>        
>>         MulfToken                       |
>>        
>>         NegiToken                       |
>>        
>>         NegfToken                       |
>>        
>>         ToRealToken                     |
>>        
>>         SinToken                        |
>>        
>>         SqrtToken                       |
>>        
>>         SubiToken                       |
>>        
>>         SubfToken                       |
>>        
>>         -- points
>>        
>>         GetXToken                       |
>>        
>>         GetYToken                       |
>>        
>>         GetZToken                       |
>>        
>>         PointToken                      |
>>        
>>         -- arrays
>>        
>>         GetToken                        |
>>        
>>         LengthToken                     |
>>        
>>         -- environment
>>        
>>         IdentifierToken String          |
>>        
>>         BinderToken String              |
>>        
>>         -- types
>>        
>>         BoolToken Bool                  |
>>        
>>         IntToken Int                    |
>>        
>>         RealToken Double                |
>>        
>>         StringToken String              deriving Show  
>>  
>> And
>>  
>> parseGml :: String -> [GmlToken]
>>        
>> parseGml input = case parse parseList "gml" input of
>>        
>>     Left err -> error ("Parse error: " ++ (show err))
>>        
>>     Right gml -> gml
>>        
>>
>>        
>> parseList = sepBy parseGml' spaces
>>        
>>
>>        
>> parseGml' =
>>        
>>         -- structures
>>        
>> --        parseFunction
>>        
>> --    <|> parseArray
>>        
>>         -- control operators
>>        
>>         parseControlOperators
>>        
>>         -- number operators
>>        
>>     <|> parseNumberOperators
>>        
>>         -- points
>>        
>>     <|> parsePointOperators
>>        
>>         -- arrays
>>        
>>     <|> parseArrayOperators
>>        
>>         -- types
>>        
>>     <|> parseBool
>>        
>>     <|> parseInteger
>>        
>>     <|> parseString
>>        
>>         -- environment
>>        
>>     <|> parseIdentifier
>>        
>>     <|> parseBinder
>>        
>>
>>        
>> parseArray = parseSequence '[' ']'
>>        
>> parseFunction = parseSequence '{' '}'
>>        
>> parseSequence start end = do char start
>>        
>>                              array <- parseList
>>        
>>                              char end
>>        
>>                              return $ ArrayToken array
>>        
>>
>>        
>> parseControlOperators = parseApply <|> parseIf
>>        
>> parseApply = do string "apply"
>>        
>>                 return $ ApplyToken
>>        
>> parseIf = do string "if"
>>        
>>              return $ IfToken
>>        
>>
>>        
>> parseNumberOperators = do string "addi"
>>        
>>                           return $ AddiToken
>>        
>>                    <|> do string "addf"
>>        
>>                           return $ AddfToken
>>        
>>                    <|> do string "acos"
>>        
>>                           return $ ACosToken
>>        
>>                    <|> do string "asind"
>>        
>>                           return $ ASinToken
>>        
>>                    <|> do string "clampf"
>>        
>>                           return $ ClampfToken
>>        
>>                    <|> do string "cos"
>>        
>>                           return $ CosToken
>>        
>>                    <|> do string "divi"
>>        
>>                           return $ DiviToken
>>        
>>                    <|> do string "divf"
>>        
>>                           return $ DivfToken
>>        
>>                    <|> do string "eqi"
>>        
>>                           return $ EqiToken
>>        
>>                    <|> do string "eqf"
>>        
>>                           return $ EqfToken
>>        
>>                    <|> do string "floor"
>>        
>>                           return $ FloorToken
>>        
>>                    <|> do string "frac"
>>        
>>                           return $ FracToken
>>        
>>                    <|> do string "lessi"
>>        
>>                           return $ LessiToken
>>        
>>                    <|> do string "lessf"
>>        
>>                           return $ LessfToken
>>        
>>                    <|> do string "modi"
>>        
>>                           return $ ModiToken
>>        
>>                    <|> do string "muli"
>>        
>>                           return $ MuliToken
>>        
>>                    <|> do string "mulf"
>>        
>>                           return $ MulfToken
>>        
>>                    <|> do string "negi"
>>        
>>                           return $ NegiToken
>>        
>>                    <|> do string "negf"
>>        
>>                           return $ NegfToken
>>        
>>                    <|> do string "real"
>>        
>>                           return $ ToRealToken
>>        
>>                    <|> do string "sin"
>>        
>>                           return $ SinToken
>>        
>>                    <|> do string "Sqrt"
>>        
>>                           return $ SqrtToken
>>        
>>                    <|> do string "subi"
>>        
>>                           return $ SubiToken
>>        
>>                    <|> do string "subf"
>>        
>>                           return $ SubfToken
>>        
>>
>>        
>> parsePointOperators = do string "getx"
>>        
>>                          return $ GetXToken
>>        
>>                   <|> do string "gety"
>>        
>>                          return $ GetYToken
>>        
>>                   <|> do string "getz"
>>        
>>                          return $ GetZToken
>>        
>>                   <|> do string "point"
>>        
>>                          return $ PointToken
>>        
>>
>>        
>> parseArrayOperators = do string "get"
>>        
>>                          return $ GetToken
>>        
>>                   <|> do string "length"
>>        
>>                          return $ LengthToken
>>        
>>
>>        
>> parseInteger = liftM (IntToken . read) $ many1 digit
>>        
>> parseBool = do string "true"
>>        
>>                return $ BoolToken True
>>        
>>         <|> do string "false"
>>        
>>                return $ BoolToken False
>>        
>> parseString = do char '"'
>>        
>>                  string <- many (noneOf "\"")
>>        
>>                  char '"'
>>        
>>                  return $ StringToken string
>>        
>>
>>        
>> parseIdentifier = do identifier <- many (noneOf " ")
>>        
>>                      return $ IdentifierToken identifier
>>        
>> parseBinder = do char '/'
>>        
>>                  binder <- many (noneOf " ")
>>        
>>                  return $ BinderToken binder
>>        
>>
>>        
>> parseComment = do char '%'
>>        
>>                   many (noneOf "")
>>        
>>                   newline
>>        
>>                   return $ ()
>>        
>>
>>        
>> spaces = skipMany1 space       
>>  
>>  
>> After gluing all this together in my mind all worked well. But it
>> doesn't. The test file for parsing looks like:
>>  
>> { /self /n
>>   n 2 lessi
>>   { 1 }
>>   { n 1 subi self self apply n muli }
>>   if
>> } /fact
>>  
>> 12 fact fact apply
>>  
>>
>>     * I think there is a problem with overlaping rules. There is a
>>       parser rule consuming "negi" and resulting in the NegiToken. A
>>       single "n" is a valid identifier. For the example file my parser
>>       says: unexpected " ", expecting "negi"
>>     * I think the same problem is present for "parseInteger" and
>>       "parseReal" (currently no in code but looks like "parseReal = do a
>>       <- many1 digit \n char '.' \n b <- many1 digit \n return $
>>       RealToken (read (a ++ "." ++ b)"
>>     * Something with "parseFunction" is going really wrong.
>>     * All parsig rules are designed with the condition that each
>>       "construct" is separated by whitespaces. For this before parsing
>>       the input will be preprocessed: insert spaces, removing
>>       whitespaces and so on. Now in the parsed result appears
>>       (IdentifierToken ""). I think my version is not the best way to
>>       parse a identifer:
>>
>> parseIdentifier = *do* identifier <- many (noneOf " ")
>>                      return $ *IdentifierToken* identifier
>>  
>> Please help me.
>>  
>> Klaus
>>  
>>
>>
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe