[Haskell-beginners] Attoparsec and multiline comments
alexander.vladislav.popov at gmail.com
Thu Sep 15 11:48:39 CEST 2011
Very, very interesting. Thank you, Yitzchak. I'll try.
2011/9/15 Yitzchak Gale <gale at sefer.org>
> Alexander.Vladislav.Popov wrote:
> >> Help me to parse multiline comments (like /* ... */ or even like /*- ...
> >> -*/) using attoparsec.
> Daniel Fischer wrote:
> > The docs for manyTill at
> > Attoparsec-Combinator.html
> > have the example
> > simpleComment = string "<!--" *> manyTill anyChar (try (string "-->"))
> Unfortunately, many of the examples in the documentation for Attoparsec
> are just lifted directly from the Parsec documentation, so they
> need to be modified somewhat to work in Attoparsed.
> This is one of those cases.
> First of all, the string function in Attoparsec has type
> string :: ByteString -> Parser ByteString
> So you can't use a String as its argument directly; you need to
> wrap the String with Data.ByteString.Char8.pack.
> If you are parsing source code text, you anyway would be better
> off using Text instead of String and attoparsec-text instead of
> attoparsec. That, together with the OverloadedStrings extension,
> will solve that problem and more. Then you don't even need
> the "string" function. Just use string literals to parse strings,
> and the compiler will automatically insert calls to "string".
> The next issue is that you don't need "try" - unlike in Parsec,
> the string combinator in Attoparsec does *not* consume any
> input when it fails.
> Finally, if you are bothering to use attoparsec, it is presumably
> because you want the super speed it is able to achieve using
> fusion. But the documentation points out that you don't get that
> fusion for combinators like manyTill, only for "byte-oriented"
> combinators like takeTill. So you should factor the parser to
> parse the comment (quickly!) into chunks beginning with
> '*', then use manyTill only on the chunks.
> Here is an untested example with the above modifications:
> comment = "/*" .*> (emptyComment <|> T.concat <$> commentChunks)
> emptyComment = "*/" .*> pure T.empty
> commentChunks = manyTill (takeWhile1 (/= '*')) (string "*/")
> I did need to use the string keyword with manyTill, because
> manyTill is too polymorphic for the compiler to be able to
> deduce the type of the string literal. Since manyTill is so
> common, I often define a type-specialized version of it
> for convenience:
> manyTillS :: Parser a -> Parser Text -> Parser [a]
> manyTillS = manyTill
> That is analogous to ".*>", attoparsec-text's type-specialized
> version of "*>" from Control.Applicative.
> Then you can write:
> commentChunks = takeWhile1 (/= '*') `manyTillS` "*/"
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beginners