[Haskell-cafe] HTML library with DOM?
michael at snoyman.com
Thu Oct 7 08:37:53 EDT 2010
2010/10/7 Gregory Collins <greg at gregorycollins.net>:
> "Edward Z. Yang" <ezyang at MIT.EDU> writes:
>> Excerpts from Gregory Collins's message of Wed Oct 06 19:44:44 -0400 2010:
>>> I've got the month of October off, and one of the things I've been
>>> planning on working on is a compliant HTML5 parser for Haskell --
>>> something which is sorely needed! I will ping the list back if/when I
>>> get it finished.
>> I've heard that some of the existing HTML parsers in Haskell were
>> already HTML5 compliant (this topic came up when I was complaining
>> that there were some algorithms that you absolutely had to have
>> state for, because that was how they were specified.) I never
>> verified this assertion though.
> If there's already a library which *correctly* parses html5 documents
> into DOM trees, could someone please let me know so I can use it instead
> of wasting a bunch of time writing one?
As far as I know, Neil Mitchel's tagsoup parses according to the
HTML 5 parsing rules, but it just generates a list of Tags, so
you'd have to build the DOM tree up from there. I personally have had
great experience with tagsoup. It's even the core of HTML-scraping
technology powering searchonce.
More information about the Haskell-Cafe