[Haskell] understanding HaXml and escaping

Thu Oct 28 08:48:07 EDT 2004

Is there a good entry point into HaXml?
I've now spent some time trying to understand it
and feel like I've gotten nowhere.

The Haddock documentation enumerates what each
function does, but I still don't know how to
produce a valid XML document?

For example, this is obviously the wrong way to
go:

  simp2 = document $ Document (Prolog Nothing [] Nothing []) [] $
		Elem "root" [("attr",AttValue [Left "v\"al"])]
		 [CString False "<<<<<>>&&&"]

Because, it produces the obviously wrong:

  <root attr="v"al"><<<<<>>&&&</root>

I assume/hope that the combinators properly
encode/escape attribute values and CDATA, but
can't figure out how to generate even the
simple XML above.

And once I've done so, is there a way to put PIs
in via the combinators or do I have to import
Types and risk have unescaped stuff in my
document?

-Alex-

On Thu, 28 Oct 2004, Malcolm Wallace wrote:

> "S. Alexander Jacobson" <alex at alexjacobson.com> writes:
>
> > I modified the Prolog type to be
> >    data Prolog = Prolog (Maybe XMLDecl) [Misc] (Maybe DocTypeDecl) [Misc]
> > and then modified the Prolog parser
>
> Thanks for spotting this bug and providing a fix.  I also note that
> the XML spec allows "misc*" to follow the document top-level element:
>
>     document	   ::=   	prolog element Misc*
>
> and this too is incorrect in HaXml.  There may well be other
> occurrences of the same omission.
>
> > Given that this fix was so very easy and given
> > that the parser was already spec consistent, I now
> > have to assume that there was good reason for the
> > Prolog to be spec inconsistent, but I don't know
> > what it is...
>
> I originally assumed that Misc's were unimportant and could be
> discarded, like comments are discarded by a compiler.  I failed to
> notice that PI's should be passed through to the application.
>
> > Implementation question: Why is there so much
> > replicated code in HaXML/Html (parse.hs and
> > pretty.hs)
>
> The HTML parser does some correction of mal-formed input, which
> is not otherwise permitted by the XML spec.  Likewise, the HTML
> pretty-printer makes some wild and unjustified assumptions about the
> way that humans like to format their documents, whereas the XML pp
> is more strictly-conforming.  Once XHTML becomes common, the HTML
> parser/pp will be obsolete.
>
> Regards,
>     Malcolm
>

______________________________________________________________
S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com