[Haskell-cafe] Is XHT a good tool for parsing web pages?

Ivan Lazar Miljenovic ivan.miljenovic at gmail.com
Wed Apr 28 05:18:47 EDT 2010


Uwe Schmidt <uwe at fh-wedel.de> writes:
> The HTML parser in HXT is based on tagsoup. It's a lazy parser
> (it does not use parsec) and it tries to parse everything as HTML.
> But garbage in, garbage out, there is no approach to repair illegal HTML
> as e.g. the Tidy parsers do. The parser uses tagsoup as a scanner.

So what is parsec used for in HXT then?

-- 
Ivan Lazar Miljenovic
Ivan.Miljenovic at gmail.com
IvanMiljenovic.wordpress.com


More information about the Haskell-Cafe mailing list