[Haskell-cafe] Downloading web page in Haskell

Yitzchak Gale gale at sefer.org
Sat Nov 20 17:10:25 EST 2010


José Romildo Malaquias wrote:
> Web browsers like Firefox and Opera does not seem to have the same
> problem with this web page.
> I would like to be able to download this page from Haskell.

Hi Romildo,

This web page serves the head, including a lot of JavaScript,
and the first few hundred bytes of the body, then pauses.
That causes web browsers to begin loading and executing
the JavaScript. Apparently, the site only continues serving
the rest of the page if the JavaScript is actually loaded and
executed. If not, it aborts.

Either intentionally or unintentionally, that effectively prevents
naive scripts from accessing the page. Cute technique.

So if you don't want to honor the site author's intention not
to allow scripts to load the page, try looking through the
JavaScript and find out what you need to do to get the page to
continue loading. However, if the site author is very determined
to stop you, the JavaScript will be obfuscated or encrypted,
which would make this an annoying task.

Good luck,
Yitz


More information about the Haskell-Cafe mailing list