[Haskell-cafe] Stripping text of xml tags and special symbols

Benja Fallenstein benja.fallenstein at gmail.com
Tue Aug 5 17:48:28 EDT 2008


Hi Pieter,

2008/8/5 Pieter Laeremans <pieter at laeremans.org>:
> But the sphinx indexer complains that the xml isn't valid.  When I look at
> the errors this seems due to some documents containing not well formed
>  html.

If you need to cope with non-well-formed HTML, try HTML Tidy:

http://tidy.sourceforge.net/

All the best,
- Benja


More information about the Haskell-Cafe mailing list