[Haskell-beginners] Attoparsec and multiline comments
Alexander.Vladislav.Popov
alexander.vladislav.popov at gmail.com
Thu Sep 15 11:48:39 CEST 2011
Very, very interesting. Thank you, Yitzchak. I'll try.
2011/9/15 Yitzchak Gale <gale at sefer.org>
> Alexander.Vladislav.Popov wrote:
> >> Help me to parse multiline comments (like /* ... */ or even like /*- ...
> >> -*/) using attoparsec.
>
> Daniel Fischer wrote:
> > The docs for manyTill at
> >
> http://hackage.haskell.org/packages/archive/attoparsec/0.9.1.2/doc/html/Data-
> > Attoparsec-Combinator.html
> > have the example
> >
> > simpleComment = string "<!--" *> manyTill anyChar (try (string "-->"))
>
> Unfortunately, many of the examples in the documentation for Attoparsec
> are just lifted directly from the Parsec documentation, so they
> need to be modified somewhat to work in Attoparsed.
> This is one of those cases.
>
> First of all, the string function in Attoparsec has type
>
> string :: ByteString -> Parser ByteString
>
> So you can't use a String as its argument directly; you need to
> wrap the String with Data.ByteString.Char8.pack.
>
> If you are parsing source code text, you anyway would be better
> off using Text instead of String and attoparsec-text instead of
> attoparsec. That, together with the OverloadedStrings extension,
> will solve that problem and more. Then you don't even need
> the "string" function. Just use string literals to parse strings,
> and the compiler will automatically insert calls to "string".
>
> The next issue is that you don't need "try" - unlike in Parsec,
> the string combinator in Attoparsec does *not* consume any
> input when it fails.
>
> Finally, if you are bothering to use attoparsec, it is presumably
> because you want the super speed it is able to achieve using
> fusion. But the documentation points out that you don't get that
> fusion for combinators like manyTill, only for "byte-oriented"
> combinators like takeTill. So you should factor the parser to
> parse the comment (quickly!) into chunks beginning with
> '*', then use manyTill only on the chunks.
>
> Here is an untested example with the above modifications:
>
> comment = "/*" .*> (emptyComment <|> T.concat <$> commentChunks)
> emptyComment = "*/" .*> pure T.empty
> commentChunks = manyTill (takeWhile1 (/= '*')) (string "*/")
>
> I did need to use the string keyword with manyTill, because
> manyTill is too polymorphic for the compiler to be able to
> deduce the type of the string literal. Since manyTill is so
> common, I often define a type-specialized version of it
> for convenience:
>
> manyTillS :: Parser a -> Parser Text -> Parser [a]
> manyTillS = manyTill
>
> That is analogous to ".*>", attoparsec-text's type-specialized
> version of "*>" from Control.Applicative.
>
> Then you can write:
>
> commentChunks = takeWhile1 (/= '*') `manyTillS` "*/"
>
> Regards,
> Yitz
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20110915/028e2507/attachment.htm>
More information about the Beginners
mailing list