[Haskell] HaXML incorrect interpretation of XML spec!
Malcolm.Wallace at cs.york.ac.uk
Thu Oct 28 05:42:57 EDT 2004
"S. Alexander Jacobson" <alex at alexjacobson.com> writes:
> I modified the Prolog type to be
> data Prolog = Prolog (Maybe XMLDecl) [Misc] (Maybe DocTypeDecl) [Misc]
> and then modified the Prolog parser
Thanks for spotting this bug and providing a fix. I also note that
the XML spec allows "misc*" to follow the document top-level element:
document ::= prolog element Misc*
and this too is incorrect in HaXml. There may well be other
occurrences of the same omission.
> Given that this fix was so very easy and given
> that the parser was already spec consistent, I now
> have to assume that there was good reason for the
> Prolog to be spec inconsistent, but I don't know
> what it is...
I originally assumed that Misc's were unimportant and could be
discarded, like comments are discarded by a compiler. I failed to
notice that PI's should be passed through to the application.
> Implementation question: Why is there so much
> replicated code in HaXML/Html (parse.hs and
The HTML parser does some correction of mal-formed input, which
is not otherwise permitted by the XML spec. Likewise, the HTML
pretty-printer makes some wild and unjustified assumptions about the
way that humans like to format their documents, whereas the XML pp
is more strictly-conforming. Once XHTML becomes common, the HTML
parser/pp will be obsolete.
More information about the Haskell