[Haskell-cafe] Parsec: using two different parser for the same string

Dan Weston westondan at imageworks.com
Wed Aug 5 17:33:49 EDT 2009


I think parsecMap does the job here:

-----------------------
import Text.ParserCombinators.Parsec hiding ((<|>))
import Text.Parsec.Prim(parsecMap)
import Control.Applicative((<|>))
import Control.Arrow((|||),(&&&))

-- Tagged (:)
(<>) :: Either Char Char -> Either String String -> Either String String
Left  a <> Left  b = Left  (a:b)
Left  a <> Right b = Left  (a:b)
Right a <> Left  b = Left  (a:b)
Right a <> Right b = Right (a:b)

-- Tagged concat
stringParser :: [Either Char Char] -> Either String String
stringParser = foldr (<>) (Right "")

-- Parse Integer if properly tagged, keeping unparsed string
maybeToInteger :: Either String String -> (Maybe Integer, String)
maybeToInteger = (const Nothing ||| Just . read) &&& (id ||| id)

-- Tagged-choice parser
intOrStringParser = parsecMap (maybeToInteger . stringParser)
   $ many1 (parsecMap Right digit <|> parsecMap Left (noneOf ";)"))

-- Parse between parentheses
intOrStringListParser = between (char '(')
                                 (char ')')
                                 (sepBy1 intOrStringParser (char ';'))
-----------------------

Then you get a tagged version of each string, along with the string itself:

*P> parseTest intOrStringListParser $ "(1;2w4;8;85)"
[(Just 1,"1"),(Nothing,"2w4"),(Just 8,"8"),(Just 85,"85")]

There may be some parsecMap-fold fusion optimization possible, though I 
haven't looked into that.

Dan

Paul Sujkov wrote:
> Hi everybody,
> 
> suppose I have two different parsers: one just reads the string, and 
> another one parses some values from it. E.g.:
> 
> parseIntList :: Parser [Integer]
> parseIntList = do
>   char '('
>   res <- liftM (map read) (sepBy1 (many1 digit) (char ';'))
>   char ')'
>   return res
> 
> parseIntString :: Parser String
> parseIntString = manyTill anyChar eof
> 
> so for some input like this - "(1;2;3;4)" - I will have two different 
> result:
> 
> *Parlog> parseTest parseIntList "(1;2;3;4)"
> [1,2,3,4]
> *Parlog> parseTest parseIntString "(1;2;3;4)"
> "(1;2;3;4)"
> 
> but the thing that I actually want is something like Parser ([Integer], 
> String) - results from both parsers at a time, no matter whether one of 
> them fails or not:
> 
> *Parlog> parseTest parseIntListAndString "(1;2;3;4)"
> ([1,2,3,4], "(1;2;3;4)")
> 
> it is impossible at first sight, because first parser to use will 
> consume all the input, and there will be nothing to parse for the second one
> 
> Parsec contains "choice" function, but it is implemented via <|> and 
> that is mplus - so it tries second alternative only if the first one 
> fails. Is it possible to use two parsers for the same string (with 
> try-like backtracking, no input actually consumed till the second parser 
> finishes)? I can assume only dirty hacks with the GenParser internals - 
> manual position storing and backtracking - but that is obviously not good
> 
> however, my first attempt to solve the problem was kind a like that: to 
> parse string to String, and then to use it as an input for the next 
> level parse call:
> 
> parseIntListAndString :: Parser ([Integer], String)
> parseIntListAndString = do
>   str <- parseIntString
>   return (res str, str)
>       where res str = case (parse parseIntList "" str) of
>                         Left  err -> []
>                         Right val -> val
> 
> but the problems with such a method began when I switched from Parser to 
> GenParser with user state: function parseIntList have to update the 
> state, but it can't have the same state as the parseIntListAndString any 
> more: it has it's own. I can explicitly pass the state from 
> parseIntListAndString to parseIntList, but I see no suitable way for the 
> parseIntList to update it. I can return the updated state value from the 
> parseIntList function, and call setState on a result - but it seems 
> rather ugly to mee. However, if nothing else will do, that is an alternative
> 
> it is of course possible to use two different parsers sequentially, but 
> it is also very ineffective: I need to use such multiple parsing on a 
> relatively small substring of the actual input, so little backtracking 
> would be a much nicier approach. Any suggestions?
> 
> -- 
> Regards, Paul Sujkov
> 



More information about the Haskell-Cafe mailing list