[Haskell-cafe] Overlaping Parsec rules

Christian Maeder Christian.Maeder at dfki.de
Tue Mar 8 13:50:25 CET 2011


Am 08.03.2011 13:35, schrieb Hauschild, Klaus (EXT):
> Hi Christian,
> 
> Thank you for your help. Now the current version of Parse.hs (http://code.google.com/p/hgmltracer/source/browse/trunk/hGmlTracer/src/Gml/Parse.hs) works well for the test file fact.gml.

                                   spaces
                                   char end

The idea is to call spaces _after_ every token. So the keyword parser
should be extended by ">> spaces" and also the number, char, string,
binder and identifier parsers. Rather than using 'noneOf " ]}\n\t"' I
would precisely define the identifier letters.

  isIdChar :: Char -> Bool
  isIdChar c = isLetter c || isDigit c || ....

and use "satisfy isIdChar" also in keyword's notFollowedBy.

> Now the last thing is parsing the different numbers (integer and float). If have a rule for parsing integers (parseInteger = liftM (IntToken . read) $ many1 digit) but if have currently no idea how to handle floats.

Look inside
http://hackage.haskell.org/packages/archive/parsec2/1.0.0/doc/html/src/Text-ParserCombinators-Parsec-Token.html
under "floating" and copy and adjust the code for you.

C.

> 
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Christian Maeder [mailto:Christian.Maeder at dfki.de] 
> Gesendet: Dienstag, 8. März 2011 13:26
> An: Hauschild, Klaus (EXT)
> Cc: haskell-cafe at haskell.org
> Betreff: Re: Overlaping Parsec rules
> 
> Am 08.03.2011 12:30, schrieb Hauschild, Klaus (EXT):
>> Hi Christian,
>>
>> my version of parseList works currently strange.
>> The input "{2\n2}" will be parsed fine, but something like "{ }" or so fails with 'expecting space or "}"'.
> 
> Also "}" (and "]", etc.) should be excluded as identifier letters.
> "}" is the second identifier in "{ }". (The first one is empty.)
> 
> C.
> 
>>
>> The redefinition of space is not necessary, it was copied from another tutorial code. How I write a version of parseGml that get gml token separated by any white space (space, tab, newline)?
>>
>> Thanks
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Christian Maeder [mailto:Christian.Maeder at dfki.de] 
>> Gesendet: Montag, 7. März 2011 14:18
>> An: Hauschild, Klaus (EXT)
>> Cc: haskell-cafe at haskell.org
>> Betreff: Re: Overlaping Parsec rules
>>
>> Am 07.03.2011 13:48, schrieb Hauschild, Klaus (EXT):
>>> Thanks Christian,
>>>
>>> I adapted the "keyword" parser and now "n" <-> "negi" does not occur.
>>>
>>> But there are still other problems to solve. If I activate the parseFunction the parser will answer to fact.gml unexpected end of input, expecting space or "}".
>>
>> Your redefinition of spaces (= skipMany1 space) is problematic and:
>>
>>  parseList = sepBy parseGml' spaces
>>
>> Allow for the original (optional) spaces after parseGml':
>>
>>   parseGml'' = liftM2 const parseGml' spaces
>>
>>   parseList = many parseGml''
>>
>> C.
>>
>> P.S. why do you call? many (noneOf "")
>>
>>    manyTill anyChar newline
>>
>>    or just: many (noneOf "\n")
>>
>>    (a trailing newline will be skipped by spaces)
>>
>>>
>>> Any ideas?
>>>
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Christian Maeder [mailto:Christian.Maeder at dfki.de] 
>>> Gesendet: Montag, 7. März 2011 12:23
>>> An: Hauschild, Klaus (EXT)
>>> Cc: haskell-cafe at haskell.org
>>> Betreff: Re: Overlaping Parsec rules
>>>
>>> You should parse keywords using:
>>>
>>> keyword s = try (string s) >> notFollowedBy (letter <|> digit)
>>>
>>> C.
>>>
>>> Am 07.03.2011 11:34, schrieb Hauschild, Klaus (EXT):
>>>> Hi,
>>>>  
>>>> to solve this ICFP task _http://www.cs.cornell.edu/icfp/task.htm_ I'm
>>>> currnetly working on the parser. With the hint from Thu (reading Phillip
>>>> Wadlers monadic parser paper) and consulting
>>>> _http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/Parsing_
>>>> I produce a first working version of the parser.
>>>> After this great moment I completed the token definition and near all
>>>> parsing rules. For the complete code have a look at
>>>> _http://code.google.com/p/hgmltracer/source/browse/#svn%2Ftrunk%2FhGmlTracer_
>>>>  
>>>> data GmlToken =
>>>>        
>>>>         -- structures
>>>>        
>>>>         FunctionToken TokenSequence     |
>>>>        
>>>>         ArrayToken TokenSequence        |
>>>>        
>>>>         -- control operators
>>>>        
>>>>         ApplyToken                      |
>>>>        
>>>>         IfToken                         |
>>>>        
>>>>         -- number operators
>>>>        
>>>>         AddiToken                       |
>>>>        
>>>>         AddfToken                       |
>>>>        
>>>>         ACosToken                       |
>>>>        
>>>>         ASinToken                       |
>>>>        
>>>>         ClampfToken                     |
>>>>        
>>>>         CosToken                        |
>>>>        
>>>>         DiviToken                       |
>>>>        
>>>>         DivfToken                       |
>>>>        
>>>>         EqiToken                        |
>>>>        
>>>>         EqfToken                        |
>>>>        
>>>>         FloorToken                      |
>>>>        
>>>>         FracToken                       |
>>>>        
>>>>         LessiToken                      |
>>>>        
>>>>         LessfToken                      |
>>>>        
>>>>         ModiToken                       |
>>>>        
>>>>         MuliToken                       |
>>>>        
>>>>         MulfToken                       |
>>>>        
>>>>         NegiToken                       |
>>>>        
>>>>         NegfToken                       |
>>>>        
>>>>         ToRealToken                     |
>>>>        
>>>>         SinToken                        |
>>>>        
>>>>         SqrtToken                       |
>>>>        
>>>>         SubiToken                       |
>>>>        
>>>>         SubfToken                       |
>>>>        
>>>>         -- points
>>>>        
>>>>         GetXToken                       |
>>>>        
>>>>         GetYToken                       |
>>>>        
>>>>         GetZToken                       |
>>>>        
>>>>         PointToken                      |
>>>>        
>>>>         -- arrays
>>>>        
>>>>         GetToken                        |
>>>>        
>>>>         LengthToken                     |
>>>>        
>>>>         -- environment
>>>>        
>>>>         IdentifierToken String          |
>>>>        
>>>>         BinderToken String              |
>>>>        
>>>>         -- types
>>>>        
>>>>         BoolToken Bool                  |
>>>>        
>>>>         IntToken Int                    |
>>>>        
>>>>         RealToken Double                |
>>>>        
>>>>         StringToken String              deriving Show  
>>>>  
>>>> And
>>>>  
>>>> parseGml :: String -> [GmlToken]
>>>>        
>>>> parseGml input = case parse parseList "gml" input of
>>>>        
>>>>     Left err -> error ("Parse error: " ++ (show err))
>>>>        
>>>>     Right gml -> gml
>>>>        
>>>>
>>>>        
>>>> parseList = sepBy parseGml' spaces
>>>>        
>>>>
>>>>        
>>>> parseGml' =
>>>>        
>>>>         -- structures
>>>>        
>>>> --        parseFunction
>>>>        
>>>> --    <|> parseArray
>>>>        
>>>>         -- control operators
>>>>        
>>>>         parseControlOperators
>>>>        
>>>>         -- number operators
>>>>        
>>>>     <|> parseNumberOperators
>>>>        
>>>>         -- points
>>>>        
>>>>     <|> parsePointOperators
>>>>        
>>>>         -- arrays
>>>>        
>>>>     <|> parseArrayOperators
>>>>        
>>>>         -- types
>>>>        
>>>>     <|> parseBool
>>>>        
>>>>     <|> parseInteger
>>>>        
>>>>     <|> parseString
>>>>        
>>>>         -- environment
>>>>        
>>>>     <|> parseIdentifier
>>>>        
>>>>     <|> parseBinder
>>>>        
>>>>
>>>>        
>>>> parseArray = parseSequence '[' ']'
>>>>        
>>>> parseFunction = parseSequence '{' '}'
>>>>        
>>>> parseSequence start end = do char start
>>>>        
>>>>                              array <- parseList
>>>>        
>>>>                              char end
>>>>        
>>>>                              return $ ArrayToken array
>>>>        
>>>>
>>>>        
>>>> parseControlOperators = parseApply <|> parseIf
>>>>        
>>>> parseApply = do string "apply"
>>>>        
>>>>                 return $ ApplyToken
>>>>        
>>>> parseIf = do string "if"
>>>>        
>>>>              return $ IfToken
>>>>        
>>>>
>>>>        
>>>> parseNumberOperators = do string "addi"
>>>>        
>>>>                           return $ AddiToken
>>>>        
>>>>                    <|> do string "addf"
>>>>        
>>>>                           return $ AddfToken
>>>>        
>>>>                    <|> do string "acos"
>>>>        
>>>>                           return $ ACosToken
>>>>        
>>>>                    <|> do string "asind"
>>>>        
>>>>                           return $ ASinToken
>>>>        
>>>>                    <|> do string "clampf"
>>>>        
>>>>                           return $ ClampfToken
>>>>        
>>>>                    <|> do string "cos"
>>>>        
>>>>                           return $ CosToken
>>>>        
>>>>                    <|> do string "divi"
>>>>        
>>>>                           return $ DiviToken
>>>>        
>>>>                    <|> do string "divf"
>>>>        
>>>>                           return $ DivfToken
>>>>        
>>>>                    <|> do string "eqi"
>>>>        
>>>>                           return $ EqiToken
>>>>        
>>>>                    <|> do string "eqf"
>>>>        
>>>>                           return $ EqfToken
>>>>        
>>>>                    <|> do string "floor"
>>>>        
>>>>                           return $ FloorToken
>>>>        
>>>>                    <|> do string "frac"
>>>>        
>>>>                           return $ FracToken
>>>>        
>>>>                    <|> do string "lessi"
>>>>        
>>>>                           return $ LessiToken
>>>>        
>>>>                    <|> do string "lessf"
>>>>        
>>>>                           return $ LessfToken
>>>>        
>>>>                    <|> do string "modi"
>>>>        
>>>>                           return $ ModiToken
>>>>        
>>>>                    <|> do string "muli"
>>>>        
>>>>                           return $ MuliToken
>>>>        
>>>>                    <|> do string "mulf"
>>>>        
>>>>                           return $ MulfToken
>>>>        
>>>>                    <|> do string "negi"
>>>>        
>>>>                           return $ NegiToken
>>>>        
>>>>                    <|> do string "negf"
>>>>        
>>>>                           return $ NegfToken
>>>>        
>>>>                    <|> do string "real"
>>>>        
>>>>                           return $ ToRealToken
>>>>        
>>>>                    <|> do string "sin"
>>>>        
>>>>                           return $ SinToken
>>>>        
>>>>                    <|> do string "Sqrt"
>>>>        
>>>>                           return $ SqrtToken
>>>>        
>>>>                    <|> do string "subi"
>>>>        
>>>>                           return $ SubiToken
>>>>        
>>>>                    <|> do string "subf"
>>>>        
>>>>                           return $ SubfToken
>>>>        
>>>>
>>>>        
>>>> parsePointOperators = do string "getx"
>>>>        
>>>>                          return $ GetXToken
>>>>        
>>>>                   <|> do string "gety"
>>>>        
>>>>                          return $ GetYToken
>>>>        
>>>>                   <|> do string "getz"
>>>>        
>>>>                          return $ GetZToken
>>>>        
>>>>                   <|> do string "point"
>>>>        
>>>>                          return $ PointToken
>>>>        
>>>>
>>>>        
>>>> parseArrayOperators = do string "get"
>>>>        
>>>>                          return $ GetToken
>>>>        
>>>>                   <|> do string "length"
>>>>        
>>>>                          return $ LengthToken
>>>>        
>>>>
>>>>        
>>>> parseInteger = liftM (IntToken . read) $ many1 digit
>>>>        
>>>> parseBool = do string "true"
>>>>        
>>>>                return $ BoolToken True
>>>>        
>>>>         <|> do string "false"
>>>>        
>>>>                return $ BoolToken False
>>>>        
>>>> parseString = do char '"'
>>>>        
>>>>                  string <- many (noneOf "\"")
>>>>        
>>>>                  char '"'
>>>>        
>>>>                  return $ StringToken string
>>>>        
>>>>
>>>>        
>>>> parseIdentifier = do identifier <- many (noneOf " ")
>>>>        
>>>>                      return $ IdentifierToken identifier
>>>>        
>>>> parseBinder = do char '/'
>>>>        
>>>>                  binder <- many (noneOf " ")
>>>>        
>>>>                  return $ BinderToken binder
>>>>        
>>>>
>>>>        
>>>> parseComment = do char '%'
>>>>        
>>>>                   many (noneOf "")
>>>>        
>>>>                   newline
>>>>        
>>>>                   return $ ()
>>>>        
>>>>
>>>>        
>>>> spaces = skipMany1 space       
>>>>  
>>>>  
>>>> After gluing all this together in my mind all worked well. But it
>>>> doesn't. The test file for parsing looks like:
>>>>  
>>>> { /self /n
>>>>   n 2 lessi
>>>>   { 1 }
>>>>   { n 1 subi self self apply n muli }
>>>>   if
>>>> } /fact
>>>>  
>>>> 12 fact fact apply
>>>>  
>>>>
>>>>     * I think there is a problem with overlaping rules. There is a
>>>>       parser rule consuming "negi" and resulting in the NegiToken. A
>>>>       single "n" is a valid identifier. For the example file my parser
>>>>       says: unexpected " ", expecting "negi"
>>>>     * I think the same problem is present for "parseInteger" and
>>>>       "parseReal" (currently no in code but looks like "parseReal = do a
>>>>       <- many1 digit \n char '.' \n b <- many1 digit \n return $
>>>>       RealToken (read (a ++ "." ++ b)"
>>>>     * Something with "parseFunction" is going really wrong.
>>>>     * All parsig rules are designed with the condition that each
>>>>       "construct" is separated by whitespaces. For this before parsing
>>>>       the input will be preprocessed: insert spaces, removing
>>>>       whitespaces and so on. Now in the parsed result appears
>>>>       (IdentifierToken ""). I think my version is not the best way to
>>>>       parse a identifer:
>>>>
>>>> parseIdentifier = *do* identifier <- many (noneOf " ")
>>>>                      return $ *IdentifierToken* identifier
>>>>  
>>>> Please help me.
>>>>  
>>>> Klaus
>>>>  
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Haskell-Cafe mailing list
>>>> Haskell-Cafe at haskell.org
>>>> http://www.haskell.org/mailman/listinfo/haskell-cafe



More information about the Haskell-Cafe mailing list