[Haskell-cafe] HXT: Replace an element with its text

Uwe Schmidt si at fh-wedel.de
Tue Jun 26 16:39:17 CEST 2012


Michael Orlitzky wrote

> I would like to replace,
>
>   <body><a href="#">foo</a></body>
>
> with,
>
>   <body>foo</body>
>
> using HXT. So far, the closest I've come is to parse the HTML and apply
> the following stuff:
>
>   is_link :: (ArrowXml a) => a XmlTree XmlTree
>   is_link =
>     hasName "a"
>
>   replace_links_with_their_text :: (ArrowXml a) => a XmlTree XmlTree
>   replace_links_with_their_text =
>     processTopDown $ (getText >>> mkText) `when` is_link

processTopDown $ (deep getText >>> mkText) `when` is_link

should do it. The "deep getText" will find all Text nodes, independent
of the nesting of elements in the <a>...</a> element. If you then
write the result into a document every thing is fine.

One small problem can occur when the content of the <a> Element
has e.g. the form

<body><a href="#">foo<b>bar</b></a></body>

The resulting DOM then still contains two text nodes, one for "foo"
and one for "bar". If you later search for a text "foobar"
you don't find a node. The melting of adjacent text nodes can
be done with

... (xshow (deep getText) >>> mkText) ...

Cheers,

  Uwe

-- 

Uwe Schmidt
FH Wedel
Web: http://www.fh-wedel.de/~si/




More information about the Haskell-Cafe mailing list