[Haskell-cafe] Is XHT a good tool for parsing web pages?
Malcolm Wallace
malcolm.wallace at cs.york.ac.uk
Tue Apr 27 16:58:16 EDT 2010
> Is XHT a good tool for parsing web pages?
> I read that it fails if the XML isn't strict and I know a lot of web
> pages don't use strict XHTML.
Do you mean HXT rather than XHT?
I know that the HaXml library has a separate error-correcting HTML
parser that works around most of the common non-well-formedness bugs
in HTML:
Text.XML.HaXml.Html.Parse
I believe HXT has a similar parser:
Text.XML.HXT.Parser.HtmlParsec
Indeed, some of the similarities suggest this parser was originally
lifted directly out of HaXml (as permitted by HaXml's licence),
although the two modules have now diverged significantly.
Regards,
Malcolm
More information about the Haskell-Cafe
mailing list