[Haskell-beginners] remove XML tags using Text.Regex.Posix
Tom Tobin
korpios at korpios.com
Wed Sep 30 13:30:31 EDT 2009
On Wed, Sep 30, 2009 at 11:11 AM, Jan Jakubuv <jakubuv at gmail.com> wrote:
> This is so simple that I would not recommend anything other than regular
> expressions. Use the following pattern:
>
> pat = "<tag>(.*)</tag>"
Don't use this; the * operator is greedy by default, meaning that will
match stuff like "<tag>foo</tag>bar<tag>baz</tag>", and your data will
end up being "foo</tag>bar<tag>baz". In other words, a greedy
operator tries to consume as much of the string as it possibly can
while still matching. If that regex module supports non-greedy
operators, you want something like this:
pat = "<tag>(.*?)</tag>"
A "?" after a greedy operator makes it non-greedy, meaning it will try
to match while consuming as little of the string as it can. If the
posix regex module doesn't support this, the PCRE-based one should.
More information about the Beginners
mailing list