[Haskell-cafe] Parsec: using two different parser for the same string

Paul Sujkov psujkov at gmail.com
Thu Aug 6 11:59:31 EDT 2009


Hi Dan,

thank you for the solution. It looks pretty interesting and usable, however
I'll have to spend some time understanding arrows: I never had an
opportunity to use them before. Anyway, it looks very close to what I
actually need, and in any case much less ugly than breaking the GenParser
encapsulation

2009/8/6 Dan Weston <westondan at imageworks.com>

> Of course, since ParsecT s u m is a functor, feel free to use fmap instead
> of parsecMap. Then you don't need to import from Text.Parsec.Prim.
> And in hindsight, I might prefer the name (<:>) or cons to (<>) for the
> first function, but now I'm just obsessing. :)
>
> Dan
>
>
> Dan Weston wrote:
>
>> I think parsecMap does the job here:
>>
>> -----------------------
>> import Text.ParserCombinators.Parsec hiding ((<|>))
>> import Text.Parsec.Prim(parsecMap)
>> import Control.Applicative((<|>))
>> import Control.Arrow((|||),(&&&))
>>
>> -- Tagged (:)
>> (<>) :: Either Char Char -> Either String String -> Either String String
>> Left  a <> Left  b = Left  (a:b)
>> Left  a <> Right b = Left  (a:b)
>> Right a <> Left  b = Left  (a:b)
>> Right a <> Right b = Right (a:b)
>>
>> -- Tagged concat
>> stringParser :: [Either Char Char] -> Either String String
>> stringParser = foldr (<>) (Right "")
>>
>> -- Parse Integer if properly tagged, keeping unparsed string
>> maybeToInteger :: Either String String -> (Maybe Integer, String)
>> maybeToInteger = (const Nothing ||| Just . read) &&& (id ||| id)
>>
>> -- Tagged-choice parser
>> intOrStringParser = parsecMap (maybeToInteger . stringParser)
>>   $ many1 (parsecMap Right digit <|> parsecMap Left (noneOf ";)"))
>>
>> -- Parse between parentheses
>> intOrStringListParser = between (char '(')
>>                                 (char ')')
>>                                 (sepBy1 intOrStringParser (char ';'))
>> -----------------------
>>
>> Then you get a tagged version of each string, along with the string
>> itself:
>>
>> *P> parseTest intOrStringListParser $ "(1;2w4;8;85)"
>> [(Just 1,"1"),(Nothing,"2w4"),(Just 8,"8"),(Just 85,"85")]
>>
>> There may be some parsecMap-fold fusion optimization possible, though I
>> haven't looked into that.
>>
>> Dan
>>
>> Paul Sujkov wrote:
>>
>>> Hi everybody,
>>>
>>> suppose I have two different parsers: one just reads the string, and
>>> another one parses some values from it. E.g.:
>>>
>>> parseIntList :: Parser [Integer]
>>> parseIntList = do
>>>  char '('
>>>  res <- liftM (map read) (sepBy1 (many1 digit) (char ';'))
>>>  char ')'
>>>  return res
>>>
>>> parseIntString :: Parser String
>>> parseIntString = manyTill anyChar eof
>>>
>>> so for some input like this - "(1;2;3;4)" - I will have two different
>>> result:
>>>
>>> *Parlog> parseTest parseIntList "(1;2;3;4)"
>>> [1,2,3,4]
>>> *Parlog> parseTest parseIntString "(1;2;3;4)"
>>> "(1;2;3;4)"
>>>
>>> but the thing that I actually want is something like Parser ([Integer],
>>> String) - results from both parsers at a time, no matter whether one of them
>>> fails or not:
>>>
>>> *Parlog> parseTest parseIntListAndString "(1;2;3;4)"
>>> ([1,2,3,4], "(1;2;3;4)")
>>>
>>> it is impossible at first sight, because first parser to use will consume
>>> all the input, and there will be nothing to parse for the second one
>>>
>>> Parsec contains "choice" function, but it is implemented via <|> and that
>>> is mplus - so it tries second alternative only if the first one fails. Is it
>>> possible to use two parsers for the same string (with try-like backtracking,
>>> no input actually consumed till the second parser finishes)? I can assume
>>> only dirty hacks with the GenParser internals - manual position storing and
>>> backtracking - but that is obviously not good
>>>
>>> however, my first attempt to solve the problem was kind a like that: to
>>> parse string to String, and then to use it as an input for the next level
>>> parse call:
>>>
>>> parseIntListAndString :: Parser ([Integer], String)
>>> parseIntListAndString = do
>>>  str <- parseIntString
>>>  return (res str, str)
>>>      where res str = case (parse parseIntList "" str) of
>>>                        Left  err -> []
>>>                        Right val -> val
>>>
>>> but the problems with such a method began when I switched from Parser to
>>> GenParser with user state: function parseIntList have to update the state,
>>> but it can't have the same state as the parseIntListAndString any more: it
>>> has it's own. I can explicitly pass the state from parseIntListAndString to
>>> parseIntList, but I see no suitable way for the parseIntList to update it. I
>>> can return the updated state value from the parseIntList function, and call
>>> setState on a result - but it seems rather ugly to mee. However, if nothing
>>> else will do, that is an alternative
>>>
>>> it is of course possible to use two different parsers sequentially, but
>>> it is also very ineffective: I need to use such multiple parsing on a
>>> relatively small substring of the actual input, so little backtracking would
>>> be a much nicier approach. Any suggestions?
>>>
>>> --
>>> Regards, Paul Sujkov
>>>
>>>
>>
>


-- 
Regards, Paul Sujkov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090806/7475105a/attachment.html


More information about the Haskell-Cafe mailing list