[Haskell-beginners] remove XML tags using Text.Regex.Posix

Wed Sep 30 01:58:35 EDT 2009

On Tue, Sep 29, 2009 at 12:25:07PM -0700, Robert Ziemba wrote:
> I have been working with the regular expression package (Text.Regex.Posix).
>  My hope was to find a simple way to remove a pair of XML tags from a short
> string.
> 
> I have something like this "<tag>Data</tag>" and would like to extract
> 'Data'.  There is only one tag pair, no nesting, and I know exactly what the
> tag is.
> 
> My first attempt was this:
> 
>   "<tag>123</tag>" =~ "[^<tag>].+[^</tag>]"::String
> 
> result:  "123"
> 
> Upon further experimenting I realized that it only works with more than 2
> digits in 'Data'.  I occured to me that my thinking on how this regular
> expression works was not correct - but I don't understand why it works at
> all for 3 or more digits.
> 
> Can anyone help me understand this result and perhaps suggest another
> strategy?  Thank you.

Personally I would have used tagsoup for this sort of thing.  Keep in mind the
eternal words

  Some people, when confronted with a problem, think 'I know, I'll use
  regular expressions.' Now they have two problems.
       -- Jamie Zawinski

As you so nicely demonstrated yourself ;-)

/M

-- 
Magnus Therning                        (OpenPGP: 0xAB4DFBA4)
magnus＠therning．org          Jabber: magnus＠therning．org
http://therning.org/magnus         identi.ca|twitter: magthe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/beginners/attachments/20090930/bfc363ea/attachment-0001.bin