Since the document claims it is HTML, you should be parsing it with an HTML parser. Try hxt-tagsoup -- specifically, the "parseHtmlTagSoup" arrow. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110418/be6ffeb1/attachment.htm>