[Haskell-cafe] Get data from HTML pages

Mon Aug 31 11:23:41 EDT 2009

Hello José,

I've done a similar task some weeks ago and I used the Haskell XML 
Toolbox (hxt) [1] to do this. After learning how to program with arrows 
it was quite easy to write arrows that extract the relevant information 
from XML data.

Regards,

Martin.

[1] http://hackage.haskell.org/package/hxt

José Romildo Malaquias schrieb:
> Hello.
> 
> I am porting to Haskell a Java application I have written to manage
> collections of movies.
> 
> Currently the application has an option to indirectly import movie data
> from web pages. For that first the user should access the page in a web
> browser. Then the user should copy the rendered text in the web browser
> into an import window in my application and click an "import" button. In
> response the application parses the given text and collects any relevant
> data it knows about, using regular expressions.
> 
> For instance, to get the director information from a movie in the
> AllCenter web site I use the following regular expression:
> 
>    ^Direção:\s+(.+)$
> 
> I want to modify this scheme in order to eliminate the need to copy the
> rendered text from a web browser. Instead my application should download
> and parse the HTML page directly.
> 
> Which libraries are available in Haskell that would make it easy to get
> content information from a HTML document, in the way described above?
> 
> Regards,
> 
> Romildo
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe