HaXml and HXml toolbox; namespace support
Graham Klyne
gk at ninebynine.org
Thu Mar 18 20:55:02 EST 2004
I'm currently looking at the innards of HXml Toolbox and HaXml, with a view
to adopting an XML parser with XML namespace support.
Based on that requirement alone, HXml Toolbox would be the obvious choice,
since it already has namespace support, but I have some concerns. These
may simply be my own ignorance, so I'm airing my views here so that any
misconceptions can be corrected. I present my thoughts in terms of pro's
and con's for each.
HXml Toolbox
------------
+ XML Namespace support
+ DTD Entity handling
+ Good degree of conformance to W3C test suite
- difficult to find way around documentation; no obvious high-level
description, other than Martin Schmidt's thesis which is out-of-date with
respect to the current software.
- can't find simple String -> XML tree parsing function (dealing with
Internal DTD Entity components)
- errors seems to be reported to stderr rather than handed back to the
calling program
- complex and non-portable distribution: I'm concerned that any attempt
distribute my applications based on this library may prove difficult, short
of copying (and effectively branching) the complete source code.
- not developed with Hugs/Windows as an intended target
? efficiency: some problems parsing large XML files with Hugs 98 are noted.
? still actively supported ?
HaXml
-----
+ Already part of the common hierarchical library
+ XML handling is cleanly separated from other functions
+ separate, hand-coded lexer which I assume will give better performance
+ appears to be actively supported
- no namespace support
? DTD Entity handling ?
- errors returned to caller. As far as I can tell, errors are raised using
the 'error' function... [which I see results in program termination when
evaluated]. Ouch! (Why not 'fail' instead of 'error'?)
- source code needs CPP preprocessing
* no external DTD support [this is not a problem for me, and I'd certainly
prefer it to be optional, or at least separated from the XML parsing, to
avoid dependency on an HTTP library].
...
A weakness of both packages seems to be the handling of syntax errors in
the input.
HaXml uses HuttonMeijerWallace combinators - could these be extended in the
style of Parsec to return an error description, thus making it possible to
provide an interface that allows the calling program to handle any errors?
E.g.
[[
newtype Parser s t a = P (s -> [t] -> [(a,s,[t])])
]]
becomes, say:
[[
newtype Parser s t a = P (s -> [t] -> Either String [(a,s,[t])])
]]
and define fail accordingly. Or, even, just use Parsec?
HXml Toolbox makes mention of reporting errors to stderr, I think [lost
reference]. It appears that I can isolate the XML parser, which uses
Parsec, but I'm not sure if I can isolate the DTD processing logic that
deals with entity substitutions.... This looks problematic: it seems that
entity substitution is done in an XmlStateFilter Monad. I'm finding it
really hard to tease apart the various strands of processing here, which is
indicative of my concerns about using this package.
...
So, any pointers that help me decide which way to jump would be appreciated...
#g
------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
More information about the Libraries
mailing list