[Haskell-cafe] Incremental XML parsing with namespaces?

Malcolm Wallace malcolm.wallace at cs.york.ac.uk
Mon Jun 8 16:44:35 EDT 2009


On 8 Jun 2009, at 19:39, John Millikin wrote:
> + HaXml and hexpat seem to disregard namespaces entirely -- that is,
> the root element is parsed to "doc" instead of
> ("org:myproject:mainns", "doc"), and the second child is "x:ref"
> instead of ("org:myproject:otherns", "ref").

Yes, HaXml makes no special effort to deal with namespaces.  However,  
that does not mean that dealing with namespaces is "impossible" - it  
just requires a small amount of post-processing, that is all.

For instance, it would not be difficult to start from the SAX-like  
parser
     http://hackage.haskell.org/packages/archive/HaXml/1.19.7/doc/html/Text-XML-HaXml-SAX.html

taking e.g. a constructor value
     SaxElementOpen Name [Attribute]

and converting it to your corresponding constructor value
     EventElementBegin Namespace LocalName [Attribute]

Just filter the [Attribute] of the first type for the attribute name  
"xmlns", and pull that attribute value out to become your new  
Namespace value.

Obviously there is a bit more to it than that, since namespace  
*defining* attributes, like your example xmlns:x="...", have an  
lexical scope.  You will need some kind of state to track the scope,  
possibly in the parser itself, or again possibly in a post-processing  
step over the list of output XMLEvents.

Regards,
     Malcolm



More information about the Haskell-Cafe mailing list