[Haskell-cafe] XML parser recommendation?

Uwe Schmidt uwe at fh-wedel.de
Tue Oct 23 11:17:46 EDT 2007


Yitzchak Gale wrote:

> Another question about HaXML and HXT -
> what is the level of XML spec. compliance?

Implementing the XML 1.0 Standard was
one of the goals of HXT when starting the project.
This includes full support of DTD processing,
which turned out to be the hardest part of the
whole parsing and validating stuff.
Another goal was, to stay as near as posible
to the XML spec, that meant, separate the tasks
in a clean way and do it step by step: reading, decoding, parsing,
substituting entiies, implementing the include
mechanism wiht external references, validating and normalizing.

In a second step we added a HTML parser, again
with parsec, to be able to process none standard XML.
Again we hadn't in mind to process "very large" XML
documents.

There is no technical reason of adding 3. parser (or a 4. one)
accepting something like XML, perhaps without the DTD suff,
which works lazily. The only reason not yet having done this
was lack of time and manpower.
So, dear Haskeller, feel free to participate to HXT
by developping a lazy parser and we will integarte
it into HXT.

This still does not solve the processing of "very very large"
XML document. I doubt, whether we can do this with a DOM
like approach, as in HXT or HaXml. Lazy input does not solve all problems.
A SAX like parser could be a more useful choice for very large documents.

Uwe


More information about the Haskell-Cafe mailing list