[Haskell-cafe] Parsec: using two different parser for the same
string
Paul Sujkov
psujkov at gmail.com
Sun Aug 9 18:30:56 EDT 2009
Hi Doaitse,
that is very interesting, and I'll take a precise look at the uu-parsinglib.
Regarding my original question there exist (I believe) one serious problem:
existing code is written exclusively using Parsec and it's already quite
complex. At first glimpse I don't see an obvious way to use both libraries
in one parsing module simulatiously. However, these are a very good news
indeed, thank you
2009/8/9 S. Doaitse Swierstra <doaitse at swierstra.net>
> The uu-parsinglib:
>
>
> http://hackage.haskell.org/packages/archive/uu-parsinglib/2.2.0/doc/html/Text-ParserCombinators-UU-Core.html
>
> contains a combinator to achieve just this:
>
> -- parsing two alternatives and returning both rsults
> pAscii = pSym ('\000', '\254')
> pIntList = pParens ((pSym ';') `pListSep` (read <$> pList (pSym
> ('0', '9'))))
> parseIntString = pList (pAscii)
>
> parseBoth = pPair pIntList parseIntString
>
> pPair p q = amb (Left <$> p <|> Right <$> q)
>
>
> The amb combinator tells you that it's parser parameter is ambiguous, and
> returns you all the possible results. Amazingly it still maintains its
> online behaviour. The only problem is that if either one of the parsers
> fails then you will get only a single result.
>
> I have added the code above to the Examples.hs contained in the
> uu-parsinglib (so it will show up in due time when I release a new version)
> which I am attaching. Just load this file, and call the function main to see
> what are the results of the different parsers and correction strategies. The
> only problem is that if either one of the parsers fails you will only get
> one of the results. If both fail you will get the result which fails latest
> and if both fail at the same place, the one which fails with the least
> repair costs.
>
> If you really want both results, even if the input is erroneaous, things
> become more complicated, especially if you want to embed this parser in a
> larger one, since then we have to check whether both parse the same prefix.
> If needed I could put some work into this, by making a slightly different
> version of the amb combinator.
>
> Doaitse
>
>
>
>
> On 6 aug 2009, at 21:03, Dan Weston wrote:
>
> Paul,
>>
>> Arrows (and category theory in general) are interesting, but you certainly
>> don't need to understand them for this.
>> The only arrow in this code is the lowly function arrow (->). (&&&) and
>> (|||) are duals of each other and mean, respectively, "both" and "either"
>> (though for some bizarre reason, "both" is usually called "fanout"!)
>>
>> This style of pointfree (or "pointless") code is clearer to me because I
>> don't have a bunch of variable names to invent and have lying around.
>>
>> Anyway, if you prefer, don't import Control.Arrow at all, and just use:
>>
>> -- |Both: Apply two functions to same argument and tuple the results
>> infixr 3 &&&
>> (&&&) :: (a -> b) -> (a -> c) -> a -> (b,c)
>> (f &&& g) x = (f x, g x)
>>
>> -- |Either: If argument is Left, apply Left function, else apply Right
>> function
>> infixr 2 |||
>> (|||) :: (a -> c) -> (b -> c) -> Either a b -> c
>> (|||) = either
>>
>> either is implicitly imported from the Prelude and is defined as:
>>
>> -- | Case analysis for the 'Either' type.
>> -- If the value is @'Left' a@, apply the first function to @a@;
>> -- if it is @'Right' b@, apply the second function to @b at .
>> either :: (a -> c) -> (b -> c) -> Either a b -> c
>> either f _ (Left x) = f x
>> either _ g (Right y) = g y
>>
>> Dan
>>
>> Paul Sujkov wrote:
>>
>>> Hi Dan,
>>> thank you for the solution. It looks pretty interesting and usable,
>>> however I'll have to spend some time understanding arrows: I never had an
>>> opportunity to use them before. Anyway, it looks very close to what I
>>> actually need, and in any case much less ugly than breaking the GenParser
>>> encapsulation
>>> 2009/8/6 Dan Weston <westondan at imageworks.com <mailto:
>>> westondan at imageworks.com>>
>>> Of course, since ParsecT s u m is a functor, feel free to use fmap
>>> instead of parsecMap. Then you don't need to import from
>>> Text.Parsec.Prim.
>>> And in hindsight, I might prefer the name (<:>) or cons to (<>) for
>>> the first function, but now I'm just obsessing. :)
>>> Dan
>>> Dan Weston wrote:
>>> I think parsecMap does the job here:
>>> -----------------------
>>> import Text.ParserCombinators.Parsec hiding ((<|>))
>>> import Text.Parsec.Prim(parsecMap)
>>> import Control.Applicative((<|>))
>>> import Control.Arrow((|||),(&&&))
>>> -- Tagged (:)
>>> (<>) :: Either Char Char -> Either String String -> Either
>>> String String
>>> Left a <> Left b = Left (a:b)
>>> Left a <> Right b = Left (a:b)
>>> Right a <> Left b = Left (a:b)
>>> Right a <> Right b = Right (a:b)
>>> -- Tagged concat
>>> stringParser :: [Either Char Char] -> Either String String
>>> stringParser = foldr (<>) (Right "")
>>> -- Parse Integer if properly tagged, keeping unparsed string
>>> maybeToInteger :: Either String String -> (Maybe Integer, String)
>>> maybeToInteger = (const Nothing ||| Just . read) &&& (id ||| id)
>>> -- Tagged-choice parser
>>> intOrStringParser = parsecMap (maybeToInteger . stringParser)
>>> $ many1 (parsecMap Right digit <|> parsecMap Left (noneOf ";)"))
>>> -- Parse between parentheses
>>> intOrStringListParser = between (char '(')
>>> (char ')')
>>> (sepBy1 intOrStringParser (char
>>> ';'))
>>> -----------------------
>>> Then you get a tagged version of each string, along with the
>>> string itself:
>>> *P> parseTest intOrStringListParser $ "(1;2w4;8;85)"
>>> [(Just 1,"1"),(Nothing,"2w4"),(Just 8,"8"),(Just 85,"85")]
>>> There may be some parsecMap-fold fusion optimization possible,
>>> though I haven't looked into that.
>>> Dan
>>> Paul Sujkov wrote:
>>> Hi everybody,
>>> suppose I have two different parsers: one just reads the
>>> string, and another one parses some values from it. E.g.:
>>> parseIntList :: Parser [Integer]
>>> parseIntList = do
>>> char '('
>>> res <- liftM (map read) (sepBy1 (many1 digit) (char ';'))
>>> char ')'
>>> return res
>>> parseIntString :: Parser String
>>> parseIntString = manyTill anyChar eof
>>> so for some input like this - "(1;2;3;4)" - I will have two
>>> different result:
>>> *Parlog> parseTest parseIntList "(1;2;3;4)"
>>> [1,2,3,4]
>>> *Parlog> parseTest parseIntString "(1;2;3;4)"
>>> "(1;2;3;4)"
>>> but the thing that I actually want is something like Parser
>>> ([Integer], String) - results from both parsers at a time,
>>> no matter whether one of them fails or not:
>>> *Parlog> parseTest parseIntListAndString "(1;2;3;4)"
>>> ([1,2,3,4], "(1;2;3;4)")
>>> it is impossible at first sight, because first parser to use
>>> will consume all the input, and there will be nothing to
>>> parse for the second one
>>> Parsec contains "choice" function, but it is implemented via
>>> <|> and that is mplus - so it tries second alternative only
>>> if the first one fails. Is it possible to use two parsers
>>> for the same string (with try-like backtracking, no input
>>> actually consumed till the second parser finishes)? I can
>>> assume only dirty hacks with the GenParser internals -
>>> manual position storing and backtracking - but that is
>>> obviously not good
>>> however, my first attempt to solve the problem was kind a
>>> like that: to parse string to String, and then to use it as
>>> an input for the next level parse call:
>>> parseIntListAndString :: Parser ([Integer], String)
>>> parseIntListAndString = do
>>> str <- parseIntString
>>> return (res str, str)
>>> where res str = case (parse parseIntList "" str) of
>>> Left err -> []
>>> Right val -> val
>>> but the problems with such a method began when I switched
>>> from Parser to GenParser with user state: function
>>> parseIntList have to update the state, but it can't have the
>>> same state as the parseIntListAndString any more: it has
>>> it's own. I can explicitly pass the state from
>>> parseIntListAndString to parseIntList, but I see no suitable
>>> way for the parseIntList to update it. I can return the
>>> updated state value from the parseIntList function, and call
>>> setState on a result - but it seems rather ugly to mee.
>>> However, if nothing else will do, that is an alternative
>>> it is of course possible to use two different parsers
>>> sequentially, but it is also very ineffective: I need to use
>>> such multiple parsing on a relatively small substring of the
>>> actual input, so little backtracking would be a much nicier
>>> approach. Any suggestions?
>>> -- Regards, Paul Sujkov
>>> --
>>> Regards, Paul Sujkov
>>>
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>
--
Regards, Paul Sujkov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090809/47e9a392/attachment-0001.html
More information about the Haskell-Cafe
mailing list