[Haskell-cafe] Downloading web page in Haskell

Sterling Clover s.clover at gmail.com
Sat Nov 20 18:51:24 EST 2010

On Nov 20, 2010, at 5:10 PM, Yitzchak Gale wrote:

> José Romildo Malaquias wrote:
>> Web browsers like Firefox and Opera does not seem to have the same
>> problem with this web page.
>> I would like to be able to download this page from Haskell.
> Hi Romildo,
> This web page serves the head, including a lot of JavaScript,
> and the first few hundred bytes of the body, then pauses.
> That causes web browsers to begin loading and executing
> the JavaScript. Apparently, the site only continues serving
> the rest of the page if the JavaScript is actually loaded and
> executed. If not, it aborts.

Actually, I think it's just a misconfigured proxy. The curl executable fails, at the same point, but a curl --compressed call succeeds. The curl bindings don't allow you to automatically get and decompress gzip data, so you could either set the accept: gzip header yourself, then pipe the output through the appropriate decompression routine, or, more simply, just get the page via using System.Process to drive the curl binary directly.


