[Haskell] ANN: TagSoup library 0.6

Neil Mitchell ndmitchell at gmail.com
Wed Apr 23 07:16:01 EDT 2008


Hi

I am pleased to announce the TagSoup 0.6 library, available from Hackage:

* http://hackage.haskell.org/cgi-bin/hackage-scripts/package/tagsoup

"TagSoup is a library for extracting information out of unstructured
HTML code, sometimes known as tag-soup. The HTML does not have to be
well formed, or render properly within any particular framework. This
library is for situations where the author of the HTML is not
cooperating with the person trying to extract the information, but is
also not trying to hide the information."

You may also be interested in:

* TagSoup home page: http://www-users.cs.york.ac.uk/~ndm/tagsoup/
* TagSoup manual: http://www.cs.york.ac.uk/fp/darcs/tagsoup/tagsoup.htm

New Bits
-------------

* The primary reason for this release is that the API has changed. If
you use parseTagsOptions (instead of the more standard parseTags), you
will need to pass a ParseOptions type (previously an Options type),
which can be created with parseOptions (previously options). This
change should be simple to make, although anyone having problems is
welcome to email me.

* I have updated the Google Tech News example, as Google changed the
HTML they generate.

* Included in this release are two experimental features: Render and
Tree. I welcome potential users to email me and check they meet their
needs.

Render lets you render a TagSoup stream, so you can take some TagSoup,
modify it, and write it out. For example:

Text.HTML.TagSoup.Render> renderTags [TagOpen "b" [],TagText
"Hello",TagClose "b"]
"<b>Hello</b>"

Tree lets you view a TagSoup stream as a tree of nested tags. For example:

Text.HTML.TagSoup.Tree> tagTree [TagOpen "b" [],TagText "Hello",TagClose "b"]
[TagBranch "b" [] [TagLeaf (TagText "Hello")]]

Thanks

Neil


More information about the Haskell mailing list