[Haskell-beginners] remove XML tags using Text.Regex.Posix
aditya siram
aditya.siram at gmail.com
Wed Sep 30 03:06:05 EDT 2009
This is how I did it using the HXT library :
Prelude Text.XML.HXT.Parser.XmlParsec Text.XML.HXT.Arrow.XmlIOStateArrow
Text.XML.HXT.Arrow> runX (readString [] "<tag>123</tag>" >>> getXPathTrees
"tag" >>> getChildren >>> getText)
["123"]
Everything after "Prelude" upto the first ">" is what you have to import to
make this work.
-"readString" converts the input string into a internal representation of an
XML tree
-"getXPathTrees" sets the path to all <tag>'s,
-"getChildren" narrows it down to the data between <tag> and </tag>,
-"getText" extracts all the data between those tags,
-"runX" fires up the whole process and returns the results as a list in the
IO Monad.
hth,
deech
On Tue, Sep 29, 2009 at 2:25 PM, Robert Ziemba <rziemba at gmail.com> wrote:
> I have been working with the regular expression package (Text.Regex.Posix).
> My hope was to find a simple way to remove a pair of XML tags from a short
> string.
>
> I have something like this "<tag>Data</tag>" and would like to extract
> 'Data'. There is only one tag pair, no nesting, and I know exactly what the
> tag is.
>
> My first attempt was this:
>
> "<tag>123</tag>" =~ "[^<tag>].+[^</tag>]"::String
>
> result: "123"
>
> Upon further experimenting I realized that it only works with more than 2
> digits in 'Data'. I occured to me that my thinking on how this regular
> expression works was not correct - but I don't understand why it works at
> all for 3 or more digits.
>
> Can anyone help me understand this result and perhaps suggest another
> strategy? Thank you.
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20090930/43297f2e/attachment.html
More information about the Beginners
mailing list