[Haskell-cafe] Is XHT a good tool for parsing web pages?

Malcolm Wallace malcolm.wallace at cs.york.ac.uk
Tue Apr 27 16:58:16 EDT 2010


> Is XHT a good tool for parsing web pages?
> I read that it fails if the XML isn't strict and I know a lot of web  
> pages don't use strict XHTML.

Do you mean HXT rather than XHT?

I know that the HaXml library has a separate error-correcting HTML  
parser that works around most of the common non-well-formedness bugs  
in HTML:
     Text.XML.HaXml.Html.Parse

I believe HXT has a similar parser:
     Text.XML.HXT.Parser.HtmlParsec

Indeed, some of the similarities suggest this parser was originally  
lifted directly out of HaXml (as permitted by HaXml's licence),  
although the two modules have now diverged significantly.

Regards,
     Malcolm



More information about the Haskell-Cafe mailing list