[Haskell] HaXML incorrect interpretation of XML spec!

Malcolm Wallace Malcolm.Wallace at cs.york.ac.uk
Thu Oct 28 05:42:57 EDT 2004


"S. Alexander Jacobson" <alex at alexjacobson.com> writes:

> I modified the Prolog type to be
>    data Prolog = Prolog (Maybe XMLDecl) [Misc] (Maybe DocTypeDecl) [Misc]
> and then modified the Prolog parser

Thanks for spotting this bug and providing a fix.  I also note that
the XML spec allows "misc*" to follow the document top-level element:

    document	   ::=   	prolog element Misc*

and this too is incorrect in HaXml.  There may well be other
occurrences of the same omission.

> Given that this fix was so very easy and given
> that the parser was already spec consistent, I now
> have to assume that there was good reason for the
> Prolog to be spec inconsistent, but I don't know
> what it is...

I originally assumed that Misc's were unimportant and could be
discarded, like comments are discarded by a compiler.  I failed to
notice that PI's should be passed through to the application.

> Implementation question: Why is there so much
> replicated code in HaXML/Html (parse.hs and
> pretty.hs)

The HTML parser does some correction of mal-formed input, which
is not otherwise permitted by the XML spec.  Likewise, the HTML
pretty-printer makes some wild and unjustified assumptions about the
way that humans like to format their documents, whereas the XML pp
is more strictly-conforming.  Once XHTML becomes common, the HTML
parser/pp will be obsolete.

Regards,
    Malcolm


More information about the Haskell mailing list