[Haskell-cafe] Get data from HTML pages
Martin Huschenbett
huschi at gmx.org
Mon Aug 31 11:23:41 EDT 2009
Hello José,
I've done a similar task some weeks ago and I used the Haskell XML
Toolbox (hxt) [1] to do this. After learning how to program with arrows
it was quite easy to write arrows that extract the relevant information
from XML data.
Regards,
Martin.
[1] http://hackage.haskell.org/package/hxt
José Romildo Malaquias schrieb:
> Hello.
>
> I am porting to Haskell a Java application I have written to manage
> collections of movies.
>
> Currently the application has an option to indirectly import movie data
> from web pages. For that first the user should access the page in a web
> browser. Then the user should copy the rendered text in the web browser
> into an import window in my application and click an "import" button. In
> response the application parses the given text and collects any relevant
> data it knows about, using regular expressions.
>
> For instance, to get the director information from a movie in the
> AllCenter web site I use the following regular expression:
>
> ^Direção:\s+(.+)$
>
> I want to modify this scheme in order to eliminate the need to copy the
> rendered text from a web browser. Instead my application should download
> and parse the HTML page directly.
>
> Which libraries are available in Haskell that would make it easy to get
> content information from a HTML document, in the way described above?
>
> Regards,
>
> Romildo
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list