[Haskell-cafe] Parsec: using two different parser for the same string

Dan Weston westondan at imageworks.com
Wed Aug 5 19:33:34 EDT 2009


Of course, since ParsecT s u m is a functor, feel free to use fmap 
instead of parsecMap. Then you don't need to import from Text.Parsec.Prim.
And in hindsight, I might prefer the name (<:>) or cons to (<>) for the 
first function, but now I'm just obsessing. :)

Dan

Dan Weston wrote:
> I think parsecMap does the job here:
> 
> -----------------------
> import Text.ParserCombinators.Parsec hiding ((<|>))
> import Text.Parsec.Prim(parsecMap)
> import Control.Applicative((<|>))
> import Control.Arrow((|||),(&&&))
> 
> -- Tagged (:)
> (<>) :: Either Char Char -> Either String String -> Either String String
> Left  a <> Left  b = Left  (a:b)
> Left  a <> Right b = Left  (a:b)
> Right a <> Left  b = Left  (a:b)
> Right a <> Right b = Right (a:b)
> 
> -- Tagged concat
> stringParser :: [Either Char Char] -> Either String String
> stringParser = foldr (<>) (Right "")
> 
> -- Parse Integer if properly tagged, keeping unparsed string
> maybeToInteger :: Either String String -> (Maybe Integer, String)
> maybeToInteger = (const Nothing ||| Just . read) &&& (id ||| id)
> 
> -- Tagged-choice parser
> intOrStringParser = parsecMap (maybeToInteger . stringParser)
>    $ many1 (parsecMap Right digit <|> parsecMap Left (noneOf ";)"))
> 
> -- Parse between parentheses
> intOrStringListParser = between (char '(')
>                                  (char ')')
>                                  (sepBy1 intOrStringParser (char ';'))
> -----------------------
> 
> Then you get a tagged version of each string, along with the string itself:
> 
> *P> parseTest intOrStringListParser $ "(1;2w4;8;85)"
> [(Just 1,"1"),(Nothing,"2w4"),(Just 8,"8"),(Just 85,"85")]
> 
> There may be some parsecMap-fold fusion optimization possible, though I 
> haven't looked into that.
> 
> Dan
> 
> Paul Sujkov wrote:
>> Hi everybody,
>>
>> suppose I have two different parsers: one just reads the string, and 
>> another one parses some values from it. E.g.:
>>
>> parseIntList :: Parser [Integer]
>> parseIntList = do
>>   char '('
>>   res <- liftM (map read) (sepBy1 (many1 digit) (char ';'))
>>   char ')'
>>   return res
>>
>> parseIntString :: Parser String
>> parseIntString = manyTill anyChar eof
>>
>> so for some input like this - "(1;2;3;4)" - I will have two different 
>> result:
>>
>> *Parlog> parseTest parseIntList "(1;2;3;4)"
>> [1,2,3,4]
>> *Parlog> parseTest parseIntString "(1;2;3;4)"
>> "(1;2;3;4)"
>>
>> but the thing that I actually want is something like Parser ([Integer], 
>> String) - results from both parsers at a time, no matter whether one of 
>> them fails or not:
>>
>> *Parlog> parseTest parseIntListAndString "(1;2;3;4)"
>> ([1,2,3,4], "(1;2;3;4)")
>>
>> it is impossible at first sight, because first parser to use will 
>> consume all the input, and there will be nothing to parse for the second one
>>
>> Parsec contains "choice" function, but it is implemented via <|> and 
>> that is mplus - so it tries second alternative only if the first one 
>> fails. Is it possible to use two parsers for the same string (with 
>> try-like backtracking, no input actually consumed till the second parser 
>> finishes)? I can assume only dirty hacks with the GenParser internals - 
>> manual position storing and backtracking - but that is obviously not good
>>
>> however, my first attempt to solve the problem was kind a like that: to 
>> parse string to String, and then to use it as an input for the next 
>> level parse call:
>>
>> parseIntListAndString :: Parser ([Integer], String)
>> parseIntListAndString = do
>>   str <- parseIntString
>>   return (res str, str)
>>       where res str = case (parse parseIntList "" str) of
>>                         Left  err -> []
>>                         Right val -> val
>>
>> but the problems with such a method began when I switched from Parser to 
>> GenParser with user state: function parseIntList have to update the 
>> state, but it can't have the same state as the parseIntListAndString any 
>> more: it has it's own. I can explicitly pass the state from 
>> parseIntListAndString to parseIntList, but I see no suitable way for the 
>> parseIntList to update it. I can return the updated state value from the 
>> parseIntList function, and call setState on a result - but it seems 
>> rather ugly to mee. However, if nothing else will do, that is an alternative
>>
>> it is of course possible to use two different parsers sequentially, but 
>> it is also very ineffective: I need to use such multiple parsing on a 
>> relatively small substring of the actual input, so little backtracking 
>> would be a much nicier approach. Any suggestions?
>>
>> -- 
>> Regards, Paul Sujkov
>>
> 



More information about the Haskell-Cafe mailing list