[Haskell] understanding HaXml and escaping
S. Alexander Jacobson
alex at alexjacobson.com
Thu Oct 28 08:48:07 EDT 2004
Is there a good entry point into HaXml?
I've now spent some time trying to understand it
and feel like I've gotten nowhere.
The Haddock documentation enumerates what each
function does, but I still don't know how to
produce a valid XML document?
For example, this is obviously the wrong way to
go:
simp2 = document $ Document (Prolog Nothing [] Nothing []) [] $
Elem "root" [("attr",AttValue [Left "v\"al"])]
[CString False "<<<<<>>&&&"]
Because, it produces the obviously wrong:
<root attr="v"al"><<<<<>>&&&</root>
I assume/hope that the combinators properly
encode/escape attribute values and CDATA, but
can't figure out how to generate even the
simple XML above.
And once I've done so, is there a way to put PIs
in via the combinators or do I have to import
Types and risk have unescaped stuff in my
document?
-Alex-
On Thu, 28 Oct 2004, Malcolm Wallace wrote:
> "S. Alexander Jacobson" <alex at alexjacobson.com> writes:
>
> > I modified the Prolog type to be
> > data Prolog = Prolog (Maybe XMLDecl) [Misc] (Maybe DocTypeDecl) [Misc]
> > and then modified the Prolog parser
>
> Thanks for spotting this bug and providing a fix. I also note that
> the XML spec allows "misc*" to follow the document top-level element:
>
> document ::= prolog element Misc*
>
> and this too is incorrect in HaXml. There may well be other
> occurrences of the same omission.
>
> > Given that this fix was so very easy and given
> > that the parser was already spec consistent, I now
> > have to assume that there was good reason for the
> > Prolog to be spec inconsistent, but I don't know
> > what it is...
>
> I originally assumed that Misc's were unimportant and could be
> discarded, like comments are discarded by a compiler. I failed to
> notice that PI's should be passed through to the application.
>
> > Implementation question: Why is there so much
> > replicated code in HaXML/Html (parse.hs and
> > pretty.hs)
>
> The HTML parser does some correction of mal-formed input, which
> is not otherwise permitted by the XML spec. Likewise, the HTML
> pretty-printer makes some wild and unjustified assumptions about the
> way that humans like to format their documents, whereas the XML pp
> is more strictly-conforming. Once XHTML becomes common, the HTML
> parser/pp will be obsolete.
>
> Regards,
> Malcolm
>
______________________________________________________________
S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com
More information about the Haskell
mailing list