[Haskell-cafe] Transforming XML using HXT, I'm confused.

P Orrifolius porrifolius at gmail.com
Sun Mar 22 16:28:07 EDT 2009


Hello,

I'm trying to transform a fragment of XML using HXT, specifically I'm
trying to strip out any disallowed content while retaining as much as
possible.

For example I might be given "<bar><evil>blah</evil></bar>".  I want
to get rid of the 'evil' element but retain it's content resulting in
"<bar>blah</bar>", rather than just "<bar></bar>".


My basic theory was to do a bottom up traversal replacing any invalid
elements with their children.  Unfortunately I've got myself
thoroughly confused.  The code below shows what I've tried so far
(except the many random permutations of collapseInvalids I've tried!).

I think one problem is that validElements is choosing based on the
parent element but processing will just be given the child.  Also I
think some inputs will require multiple passes over the children
during the traversal, eg given "...<baz>s6<bar>s7</bar></baz>..." I'd
like "...<baz>s6s7</baz>..." but when processing the bar it is valid,
it's only once you're processing the baz you see that you need to pull
the s7 up.


I'd really appreciate any advice, and I'm not wedded to the current
code so feel free to throw it all out! :)

Thanks
P.


ps: Is it possible to render an XmlTree as String without invoking IO?
I've just been doing runLA (hread >>> collapseInvalids) frag1 and
looking at the nodes.



module Test where

import Text.XML.HXT.Arrow

--collapseInvalids = processBottomUp (processChildren ((getChildren
`whenNot` validElements) `orElse` this))

-- <foo> can contain <foo>, <bar> or <baz>, not text
validInFoo :: (ArrowXml a) => a XmlTree XmlTree
validInFoo = foldr1 orElse (
  map hasName ["foo", "bar", "baz"]
  )

-- <bar> can contain <foo>, <baz> or text, not <foo>
validInBar :: (ArrowXml a) => a XmlTree XmlTree
validInBar = foldr1 orElse (
  isText :
  map hasName ["foo", "baz"]
  )

-- <baz> can only contain text
validInBaz :: (ArrowXml a) => a XmlTree XmlTree
validInBaz = foldr1 orElse (
  isText :
  [])

validElements :: (ArrowXml a) => a XmlTree XmlTree
validElements = choiceA (
  hasName "foo" :-> validInFoo :
  hasName "bar" :-> validInBar :
  hasName "baz" :-> validInBaz :
  isText        :-> this :
  [])

frag1 = concat (
  "s1" :
  "<foo>" :
    "<bar>" :
      "s2" :
      "<evil>" :
        "s3" :
        "<baz>" :
          "s4" :
        "</baz>" :
        "s5" :
      "</evil>" :
      "s6" :
    "</bar>" :
  "</foo>" :
  "s7" :
  [])

result1 = concat (
  "s1" :
  "<foo>" :
    "<bar>" :
      "s2s3" :
      "<baz>" :
        "s4" :
      "</baz>" :
      "s5s6" :
    "</bar>" :
  "</foo>" :
  "s7" :
  [])

frag2 = concat (
  "s1" :
  "<foo>" :
    "s2" :
    "<evil>" :
      "s3" :
    "</evil>" :
    "<bar>" :
      "s4" :
      "<evil>" :
        "s5" :
        "<baz>" :
          "s6" :
          "<bar>" :
            "s7" :
          "</bar>" :
        "</baz>" :
        "s8" :
        "<bar>" :
          "s9" :
        "</bar>" :
      "</evil>" :
      "s10" :
    "</bar>" :
  "</foo>" :
  "s11" :
  [])

result2 = concat (
  "s1" :
  "<foo>" :
    "<bar>" :
      "s4" :
      "<evil>" :
        "s5" :
        "<baz>" :
          "s6" :
          "<bar>" :
            "s7" :
          "</bar>" :
        "</baz>" :
        "s8" :
        "<bar>" :
          "s9" :
        "</bar>" :
      "</evil>" :
      "s10" :
    "</bar>" :
  "</foo>" :
  "s11" :
  [])


More information about the Haskell-Cafe mailing list