HaXml and HXml toolbox; namespace support

Graham Klyne gk at ninebynine.org
Thu Mar 18 20:55:02 EST 2004


I'm currently looking at the innards of HXml Toolbox and HaXml, with a view 
to  adopting an XML parser with XML namespace support.

Based on that requirement alone, HXml Toolbox would be the obvious choice, 
since it already has namespace support, but I have some concerns.  These 
may simply be my own ignorance, so I'm airing my views here so that any 
misconceptions can be corrected.  I present my thoughts in terms of pro's 
and con's for each.

HXml Toolbox
------------
+ XML Namespace support
+ DTD Entity handling
+ Good degree of conformance to W3C test suite
- difficult to find way around documentation;  no obvious high-level 
description, other than Martin Schmidt's thesis which is out-of-date with 
respect to the current software.
- can't find simple String -> XML tree parsing function (dealing with 
Internal DTD Entity components)
- errors seems to be reported to stderr rather than handed back to the 
calling program
- complex and non-portable distribution:  I'm concerned that any attempt 
distribute my applications based on this library may prove difficult, short 
of copying (and effectively branching) the complete source code.
- not developed with Hugs/Windows as an intended target
? efficiency:  some problems parsing large XML files with Hugs 98 are noted.
? still actively supported ?

HaXml
-----
+ Already part of the common hierarchical library
+ XML handling is cleanly separated from other functions
+ separate, hand-coded lexer which I assume will give better performance
+ appears to be actively supported
- no namespace support
? DTD Entity handling ?
- errors returned to caller.  As far as I can tell, errors are raised using 
the 'error' function... [which I see results in program termination when 
evaluated].  Ouch!  (Why not 'fail' instead of 'error'?)
- source code needs CPP preprocessing
* no external DTD support [this is not a problem for me, and I'd certainly 
prefer it to be optional, or at least separated from the XML parsing, to 
avoid dependency on an HTTP library].

...

A weakness of both packages seems to be the handling of syntax errors in 
the input.

HaXml uses HuttonMeijerWallace combinators - could these be extended in the 
style of Parsec to return an error description, thus making it possible to 
provide an interface that allows the calling program to handle any errors?
E.g.
[[
newtype Parser s t a   = P (s -> [t] -> [(a,s,[t])])
]]
becomes, say:
[[
newtype Parser s t a   = P (s -> [t] -> Either String [(a,s,[t])])
]]
and define fail accordingly.  Or, even, just use Parsec?

HXml Toolbox makes mention of reporting errors to stderr, I think [lost 
reference].   It appears that I can isolate the XML parser, which uses 
Parsec, but I'm not sure if I can isolate the DTD processing logic that 
deals with entity substitutions....  This looks problematic:  it seems that 
entity substitution is done in an XmlStateFilter Monad.  I'm finding it 
really hard to tease apart the various strands of processing here, which is 
indicative of my concerns about using this package.

...

So, any pointers that help me decide which way to jump would be appreciated...

#g


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact



More information about the Libraries mailing list