[Haskell-beginners] Re: remove XML tags using Text.Regex.Posix

Christian Maeder Christian.Maeder at dfki.de
Wed Sep 30 08:48:58 EDT 2009


I think regexs are a pain und would suggest the xml-light package for
your purpose, which is the smallest xml library. (Or use take, drop,
isPrefixOf and isSuffixOf to chop of your tags manually.)

http://hackage.haskell.org/package/xml

Cheers Christian

Prelude Text.XML.Light> concatMap strContent . onlyElems $ parseXML
 "<tag>123</tag>"
"123"



Robert Ziemba wrote:
> I have been working with the regular expression package
> (Text.Regex.Posix).  My hope was to find a simple way to remove a pair
> of XML tags from a short string.  
> 
> I have something like this "<tag>Data</tag>" and would like to extract
> 'Data'.  There is only one tag pair, no nesting, and I know exactly what
> the tag is.  
> 
> My first attempt was this:  
> 
>   "<tag>123</tag>" =~ "[^<tag>].+[^</tag>]"::String
> 
> result:  "123"
> 
> Upon further experimenting I realized that it only works with more than
> 2 digits in 'Data'.  I occured to me that my thinking on how this
> regular expression works was not correct - but I don't understand why it
> works at all for 3 or more digits. 
> 
> Can anyone help me understand this result and perhaps suggest another
> strategy?  Thank you.
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners


More information about the Beginners mailing list