[Haskell-cafe] Overlaping Parsec rules

Tue Mar 8 12:55:45 CET 2011

Am 08.03.2011 12:30, schrieb Hauschild, Klaus (EXT):
> Hi Christian,
> 
> my version of parseList works currently strange.
> The input "{2\n2}" will be parsed fine, but something like "{ }" or so fails with 'expecting space or "}"'.

You must skip (possible) spaces after "{", too. (Actually after every
lexeme.)

C.

> 
> The redefinition of space is not necessary, it was copied from another tutorial code. How I write a version of parseGml that get gml token separated by any white space (space, tab, newline)?
> 
> Thanks
> 
> -----Ursprüngliche Nachricht-----
> Von: Christian Maeder [mailto:Christian.Maeder at dfki.de] 
> Gesendet: Montag, 7. März 2011 14:18
> An: Hauschild, Klaus (EXT)
> Cc: haskell-cafe at haskell.org
> Betreff: Re: Overlaping Parsec rules
> 
> Am 07.03.2011 13:48, schrieb Hauschild, Klaus (EXT):
>> Thanks Christian,
>>
>> I adapted the "keyword" parser and now "n" <-> "negi" does not occur.
>>
>> But there are still other problems to solve. If I activate the parseFunction the parser will answer to fact.gml unexpected end of input, expecting space or "}".
> 
> Your redefinition of spaces (= skipMany1 space) is problematic and:
> 
>  parseList = sepBy parseGml' spaces
> 
> Allow for the original (optional) spaces after parseGml':
> 
>   parseGml'' = liftM2 const parseGml' spaces
> 
>   parseList = many parseGml''
> 
> C.
> 
> P.S. why do you call? many (noneOf "")
> 
>    manyTill anyChar newline
> 
>    or just: many (noneOf "\n")
> 
>    (a trailing newline will be skipped by spaces)
> 
>>
>> Any ideas?
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Christian Maeder [mailto:Christian.Maeder at dfki.de] 
>> Gesendet: Montag, 7. März 2011 12:23
>> An: Hauschild, Klaus (EXT)
>> Cc: haskell-cafe at haskell.org
>> Betreff: Re: Overlaping Parsec rules
>>
>> You should parse keywords using:
>>
>> keyword s = try (string s) >> notFollowedBy (letter <|> digit)
>>
>> C.
>>
>> Am 07.03.2011 11:34, schrieb Hauschild, Klaus (EXT):
>>> Hi,
>>>  
>>> to solve this ICFP task _http://www.cs.cornell.edu/icfp/task.htm_ I'm
>>> currnetly working on the parser. With the hint from Thu (reading Phillip
>>> Wadlers monadic parser paper) and consulting
>>> _http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/Parsing_
>>> I produce a first working version of the parser.
>>> After this great moment I completed the token definition and near all
>>> parsing rules. For the complete code have a look at
>>> _http://code.google.com/p/hgmltracer/source/browse/#svn%2Ftrunk%2FhGmlTracer_
>>>  
>>> data GmlToken =
>>>        
>>>         -- structures
>>>        
>>>         FunctionToken TokenSequence     |
>>>        
>>>         ArrayToken TokenSequence        |
>>>        
>>>         -- control operators
>>>        
>>>         ApplyToken                      |
>>>        
>>>         IfToken                         |
>>>        
>>>         -- number operators
>>>        
>>>         AddiToken                       |
>>>        
>>>         AddfToken                       |
>>>        
>>>         ACosToken                       |
>>>        
>>>         ASinToken                       |
>>>        
>>>         ClampfToken                     |
>>>        
>>>         CosToken                        |
>>>        
>>>         DiviToken                       |
>>>        
>>>         DivfToken                       |
>>>        
>>>         EqiToken                        |
>>>        
>>>         EqfToken                        |
>>>        
>>>         FloorToken                      |
>>>        
>>>         FracToken                       |
>>>        
>>>         LessiToken                      |
>>>        
>>>         LessfToken                      |
>>>        
>>>         ModiToken                       |
>>>        
>>>         MuliToken                       |
>>>        
>>>         MulfToken                       |
>>>        
>>>         NegiToken                       |
>>>        
>>>         NegfToken                       |
>>>        
>>>         ToRealToken                     |
>>>        
>>>         SinToken                        |
>>>        
>>>         SqrtToken                       |
>>>        
>>>         SubiToken                       |
>>>        
>>>         SubfToken                       |
>>>        
>>>         -- points
>>>        
>>>         GetXToken                       |
>>>        
>>>         GetYToken                       |
>>>        
>>>         GetZToken                       |
>>>        
>>>         PointToken                      |
>>>        
>>>         -- arrays
>>>        
>>>         GetToken                        |
>>>        
>>>         LengthToken                     |
>>>        
>>>         -- environment
>>>        
>>>         IdentifierToken String          |
>>>        
>>>         BinderToken String              |
>>>        
>>>         -- types
>>>        
>>>         BoolToken Bool                  |
>>>        
>>>         IntToken Int                    |
>>>        
>>>         RealToken Double                |
>>>        
>>>         StringToken String              deriving Show  
>>>  
>>> And
>>>  
>>> parseGml :: String -> [GmlToken]
>>>        
>>> parseGml input = case parse parseList "gml" input of
>>>        
>>>     Left err -> error ("Parse error: " ++ (show err))
>>>        
>>>     Right gml -> gml
>>>        
>>>
>>>        
>>> parseList = sepBy parseGml' spaces
>>>        
>>>
>>>        
>>> parseGml' =
>>>        
>>>         -- structures
>>>        
>>> --        parseFunction
>>>        
>>> --    <|> parseArray
>>>        
>>>         -- control operators
>>>        
>>>         parseControlOperators
>>>        
>>>         -- number operators
>>>        
>>>     <|> parseNumberOperators
>>>        
>>>         -- points
>>>        
>>>     <|> parsePointOperators
>>>        
>>>         -- arrays
>>>        
>>>     <|> parseArrayOperators
>>>        
>>>         -- types
>>>        
>>>     <|> parseBool
>>>        
>>>     <|> parseInteger
>>>        
>>>     <|> parseString
>>>        
>>>         -- environment
>>>        
>>>     <|> parseIdentifier
>>>        
>>>     <|> parseBinder
>>>        
>>>
>>>        
>>> parseArray = parseSequence '[' ']'
>>>        
>>> parseFunction = parseSequence '{' '}'
>>>        
>>> parseSequence start end = do char start
>>>        
>>>                              array <- parseList
>>>        
>>>                              char end
>>>        
>>>                              return $ ArrayToken array
>>>        
>>>
>>>        
>>> parseControlOperators = parseApply <|> parseIf
>>>        
>>> parseApply = do string "apply"
>>>        
>>>                 return $ ApplyToken
>>>        
>>> parseIf = do string "if"
>>>        
>>>              return $ IfToken
>>>        
>>>
>>>        
>>> parseNumberOperators = do string "addi"
>>>        
>>>                           return $ AddiToken
>>>        
>>>                    <|> do string "addf"
>>>        
>>>                           return $ AddfToken
>>>        
>>>                    <|> do string "acos"
>>>        
>>>                           return $ ACosToken
>>>        
>>>                    <|> do string "asind"
>>>        
>>>                           return $ ASinToken
>>>        
>>>                    <|> do string "clampf"
>>>        
>>>                           return $ ClampfToken
>>>        
>>>                    <|> do string "cos"
>>>        
>>>                           return $ CosToken
>>>        
>>>                    <|> do string "divi"
>>>        
>>>                           return $ DiviToken
>>>        
>>>                    <|> do string "divf"
>>>        
>>>                           return $ DivfToken
>>>        
>>>                    <|> do string "eqi"
>>>        
>>>                           return $ EqiToken
>>>        
>>>                    <|> do string "eqf"
>>>        
>>>                           return $ EqfToken
>>>        
>>>                    <|> do string "floor"
>>>        
>>>                           return $ FloorToken
>>>        
>>>                    <|> do string "frac"
>>>        
>>>                           return $ FracToken
>>>        
>>>                    <|> do string "lessi"
>>>        
>>>                           return $ LessiToken
>>>        
>>>                    <|> do string "lessf"
>>>        
>>>                           return $ LessfToken
>>>        
>>>                    <|> do string "modi"
>>>        
>>>                           return $ ModiToken
>>>        
>>>                    <|> do string "muli"
>>>        
>>>                           return $ MuliToken
>>>        
>>>                    <|> do string "mulf"
>>>        
>>>                           return $ MulfToken
>>>        
>>>                    <|> do string "negi"
>>>        
>>>                           return $ NegiToken
>>>        
>>>                    <|> do string "negf"
>>>        
>>>                           return $ NegfToken
>>>        
>>>                    <|> do string "real"
>>>        
>>>                           return $ ToRealToken
>>>        
>>>                    <|> do string "sin"
>>>        
>>>                           return $ SinToken
>>>        
>>>                    <|> do string "Sqrt"
>>>        
>>>                           return $ SqrtToken
>>>        
>>>                    <|> do string "subi"
>>>        
>>>                           return $ SubiToken
>>>        
>>>                    <|> do string "subf"
>>>        
>>>                           return $ SubfToken
>>>        
>>>
>>>        
>>> parsePointOperators = do string "getx"
>>>        
>>>                          return $ GetXToken
>>>        
>>>                   <|> do string "gety"
>>>        
>>>                          return $ GetYToken
>>>        
>>>                   <|> do string "getz"
>>>        
>>>                          return $ GetZToken
>>>        
>>>                   <|> do string "point"
>>>        
>>>                          return $ PointToken
>>>        
>>>
>>>        
>>> parseArrayOperators = do string "get"
>>>        
>>>                          return $ GetToken
>>>        
>>>                   <|> do string "length"
>>>        
>>>                          return $ LengthToken
>>>        
>>>
>>>        
>>> parseInteger = liftM (IntToken . read) $ many1 digit
>>>        
>>> parseBool = do string "true"
>>>        
>>>                return $ BoolToken True
>>>        
>>>         <|> do string "false"
>>>        
>>>                return $ BoolToken False
>>>        
>>> parseString = do char '"'
>>>        
>>>                  string <- many (noneOf "\"")
>>>        
>>>                  char '"'
>>>        
>>>                  return $ StringToken string
>>>        
>>>
>>>        
>>> parseIdentifier = do identifier <- many (noneOf " ")
>>>        
>>>                      return $ IdentifierToken identifier
>>>        
>>> parseBinder = do char '/'
>>>        
>>>                  binder <- many (noneOf " ")
>>>        
>>>                  return $ BinderToken binder
>>>        
>>>
>>>        
>>> parseComment = do char '%'
>>>        
>>>                   many (noneOf "")
>>>        
>>>                   newline
>>>        
>>>                   return $ ()
>>>        
>>>
>>>        
>>> spaces = skipMany1 space       
>>>  
>>>  
>>> After gluing all this together in my mind all worked well. But it
>>> doesn't. The test file for parsing looks like:
>>>  
>>> { /self /n
>>>   n 2 lessi
>>>   { 1 }
>>>   { n 1 subi self self apply n muli }
>>>   if
>>> } /fact
>>>  
>>> 12 fact fact apply
>>>  
>>>
>>>     * I think there is a problem with overlaping rules. There is a
>>>       parser rule consuming "negi" and resulting in the NegiToken. A
>>>       single "n" is a valid identifier. For the example file my parser
>>>       says: unexpected " ", expecting "negi"
>>>     * I think the same problem is present for "parseInteger" and
>>>       "parseReal" (currently no in code but looks like "parseReal = do a
>>>       <- many1 digit \n char '.' \n b <- many1 digit \n return $
>>>       RealToken (read (a ++ "." ++ b)"
>>>     * Something with "parseFunction" is going really wrong.
>>>     * All parsig rules are designed with the condition that each
>>>       "construct" is separated by whitespaces. For this before parsing
>>>       the input will be preprocessed: insert spaces, removing
>>>       whitespaces and so on. Now in the parsed result appears
>>>       (IdentifierToken ""). I think my version is not the best way to
>>>       parse a identifer:
>>>
>>> parseIdentifier = *do* identifier <- many (noneOf " ")
>>>                      return $ *IdentifierToken* identifier
>>>  
>>> Please help me.
>>>  
>>> Klaus
>>>  
>>>
>>>
>>>
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe at haskell.org
>>> http://www.haskell.org/mailman/listinfo/haskell-cafe