[Haskell-cafe] Parsec: using two different parser for the same string

Paul Sujkov psujkov at gmail.com
Sun Aug 9 18:30:56 EDT 2009


Hi Doaitse,

that is very interesting, and I'll take a precise look at the uu-parsinglib.
Regarding my original question there exist (I believe) one serious problem:
existing code is written exclusively using Parsec and it's already quite
complex. At first glimpse I don't see an obvious way to use both libraries
in one parsing module simulatiously. However, these are a very good news
indeed, thank you

2009/8/9 S. Doaitse Swierstra <doaitse at swierstra.net>

> The uu-parsinglib:
>
>
> http://hackage.haskell.org/packages/archive/uu-parsinglib/2.2.0/doc/html/Text-ParserCombinators-UU-Core.html
>
> contains a combinator to achieve just this:
>
> -- parsing two alternatives and returning both rsults
> pAscii         =  pSym ('\000', '\254')
> pIntList       =  pParens ((pSym ';') `pListSep` (read <$> pList (pSym
> ('0', '9'))))
> parseIntString =  pList (pAscii)
>
> parseBoth = pPair pIntList parseIntString
>
> pPair p q =  amb (Left <$> p <|> Right <$> q)
>
>
> The amb combinator tells you that it's parser parameter is ambiguous, and
> returns you all the possible results. Amazingly it still maintains its
> online behaviour. The only problem is that if either one of the parsers
> fails then you will get only a single result.
>
> I have added the code above to the Examples.hs contained in the
> uu-parsinglib (so it will show up in due time when I release a new version)
> which I am attaching. Just load this file, and call the function main to see
> what are the results of the different parsers and correction strategies. The
> only problem is that if either one of the parsers  fails you will only get
> one of the results. If both fail you will get the result which fails latest
> and if both fail at the same place, the one which fails with the least
> repair costs.
>
> If you really want both results, even if the input is erroneaous, things
> become more complicated, especially if you want to embed this parser in a
> larger one, since then we have to check whether both parse the same prefix.
> If needed I could put some work into this, by making a slightly different
> version of the amb combinator.
>
>  Doaitse
>
>
>
>
> On 6 aug 2009, at 21:03, Dan Weston wrote:
>
>  Paul,
>>
>> Arrows (and category theory in general) are interesting, but you certainly
>> don't need to understand them for this.
>> The only arrow in this code is the lowly function arrow (->). (&&&) and
>> (|||) are duals of each other and mean, respectively, "both" and "either"
>> (though for some bizarre reason, "both" is usually called "fanout"!)
>>
>> This style of pointfree (or "pointless") code is clearer to me because I
>> don't have a bunch of variable names to invent and have lying around.
>>
>> Anyway, if you prefer, don't import Control.Arrow at all, and just use:
>>
>> -- |Both: Apply two functions to same argument and tuple the results
>> infixr 3 &&&
>> (&&&) :: (a -> b) -> (a -> c) -> a -> (b,c)
>> (f &&& g) x = (f x, g x)
>>
>> -- |Either: If argument is Left, apply Left function, else apply Right
>> function
>> infixr 2 |||
>> (|||) :: (a -> c) -> (b -> c) -> Either a b -> c
>> (|||) = either
>>
>> either is implicitly imported from the Prelude and is defined as:
>>
>> -- | Case analysis for the 'Either' type.
>> -- If the value is @'Left' a@, apply the first function to @a@;
>> -- if it is @'Right' b@, apply the second function to @b at .
>> either                  :: (a -> c) -> (b -> c) -> Either a b -> c
>> either f _ (Left x)     =  f x
>> either _ g (Right y)    =  g y
>>
>> Dan
>>
>> Paul Sujkov wrote:
>>
>>> Hi Dan,
>>> thank you for the solution. It looks pretty interesting and usable,
>>> however I'll have to spend some time understanding arrows: I never had an
>>> opportunity to use them before. Anyway, it looks very close to what I
>>> actually need, and in any case much less ugly than breaking the GenParser
>>> encapsulation
>>> 2009/8/6 Dan Weston <westondan at imageworks.com <mailto:
>>> westondan at imageworks.com>>
>>>   Of course, since ParsecT s u m is a functor, feel free to use fmap
>>>   instead of parsecMap. Then you don't need to import from
>>>   Text.Parsec.Prim.
>>>   And in hindsight, I might prefer the name (<:>) or cons to (<>) for
>>>   the first function, but now I'm just obsessing. :)
>>>   Dan
>>>   Dan Weston wrote:
>>>       I think parsecMap does the job here:
>>>       -----------------------
>>>       import Text.ParserCombinators.Parsec hiding ((<|>))
>>>       import Text.Parsec.Prim(parsecMap)
>>>       import Control.Applicative((<|>))
>>>       import Control.Arrow((|||),(&&&))
>>>       -- Tagged (:)
>>>       (<>) :: Either Char Char -> Either String String -> Either
>>>       String String
>>>       Left  a <> Left  b = Left  (a:b)
>>>       Left  a <> Right b = Left  (a:b)
>>>       Right a <> Left  b = Left  (a:b)
>>>       Right a <> Right b = Right (a:b)
>>>       -- Tagged concat
>>>       stringParser :: [Either Char Char] -> Either String String
>>>       stringParser = foldr (<>) (Right "")
>>>       -- Parse Integer if properly tagged, keeping unparsed string
>>>       maybeToInteger :: Either String String -> (Maybe Integer, String)
>>>       maybeToInteger = (const Nothing ||| Just . read) &&& (id ||| id)
>>>       -- Tagged-choice parser
>>>       intOrStringParser = parsecMap (maybeToInteger . stringParser)
>>>         $ many1 (parsecMap Right digit <|> parsecMap Left (noneOf ";)"))
>>>       -- Parse between parentheses
>>>       intOrStringListParser = between (char '(')
>>>                                       (char ')')
>>>                                       (sepBy1 intOrStringParser (char
>>>       ';'))
>>>       -----------------------
>>>       Then you get a tagged version of each string, along with the
>>>       string itself:
>>>       *P> parseTest intOrStringListParser $ "(1;2w4;8;85)"
>>>       [(Just 1,"1"),(Nothing,"2w4"),(Just 8,"8"),(Just 85,"85")]
>>>       There may be some parsecMap-fold fusion optimization possible,
>>>       though I haven't looked into that.
>>>       Dan
>>>       Paul Sujkov wrote:
>>>           Hi everybody,
>>>           suppose I have two different parsers: one just reads the
>>>           string, and another one parses some values from it. E.g.:
>>>           parseIntList :: Parser [Integer]
>>>           parseIntList = do
>>>            char '('
>>>            res <- liftM (map read) (sepBy1 (many1 digit) (char ';'))
>>>            char ')'
>>>            return res
>>>           parseIntString :: Parser String
>>>           parseIntString = manyTill anyChar eof
>>>           so for some input like this - "(1;2;3;4)" - I will have two
>>>           different result:
>>>           *Parlog> parseTest parseIntList "(1;2;3;4)"
>>>           [1,2,3,4]
>>>           *Parlog> parseTest parseIntString "(1;2;3;4)"
>>>           "(1;2;3;4)"
>>>           but the thing that I actually want is something like Parser
>>>           ([Integer], String) - results from both parsers at a time,
>>>           no matter whether one of them fails or not:
>>>           *Parlog> parseTest parseIntListAndString "(1;2;3;4)"
>>>           ([1,2,3,4], "(1;2;3;4)")
>>>           it is impossible at first sight, because first parser to use
>>>           will consume all the input, and there will be nothing to
>>>           parse for the second one
>>>           Parsec contains "choice" function, but it is implemented via
>>>           <|> and that is mplus - so it tries second alternative only
>>>           if the first one fails. Is it possible to use two parsers
>>>           for the same string (with try-like backtracking, no input
>>>           actually consumed till the second parser finishes)? I can
>>>           assume only dirty hacks with the GenParser internals -
>>>           manual position storing and backtracking - but that is
>>>           obviously not good
>>>           however, my first attempt to solve the problem was kind a
>>>           like that: to parse string to String, and then to use it as
>>>           an input for the next level parse call:
>>>           parseIntListAndString :: Parser ([Integer], String)
>>>           parseIntListAndString = do
>>>            str <- parseIntString
>>>            return (res str, str)
>>>                where res str = case (parse parseIntList "" str) of
>>>                                  Left  err -> []
>>>                                  Right val -> val
>>>           but the problems with such a method began when I switched
>>>           from Parser to GenParser with user state: function
>>>           parseIntList have to update the state, but it can't have the
>>>           same state as the parseIntListAndString any more: it has
>>>           it's own. I can explicitly pass the state from
>>>           parseIntListAndString to parseIntList, but I see no suitable
>>>           way for the parseIntList to update it. I can return the
>>>           updated state value from the parseIntList function, and call
>>>           setState on a result - but it seems rather ugly to mee.
>>>           However, if nothing else will do, that is an alternative
>>>           it is of course possible to use two different parsers
>>>           sequentially, but it is also very ineffective: I need to use
>>>           such multiple parsing on a relatively small substring of the
>>>           actual input, so little backtracking would be a much nicier
>>>           approach. Any suggestions?
>>>           --             Regards, Paul Sujkov
>>> --
>>> Regards, Paul Sujkov
>>>
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>


-- 
Regards, Paul Sujkov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090809/47e9a392/attachment-0001.html


More information about the Haskell-Cafe mailing list