[Haskell-cafe] Network.Curl cookie jar madness
iusty at k1024.org
Sun Aug 19 18:06:53 CEST 2012
On Sun, Aug 19, 2012 at 12:45:47AM -0400, Michael Orlitzky wrote:
> On 08/18/2012 08:52 PM, Michael Orlitzky wrote:
> > I'm one bug away from a working program and need some help. I wrote a
> > little utility that logs into LWN.net, retrieves an article, and creates
> > an epub out of it.
> I've created two pages where anyone can test this. The first just takes
> any username and password via post and sets a session variable. The
> second prints "Success." if the session variable is set, and "Failure."
> if it isn't. The bash script,
> The attached haskell program using Network.Curl, doesn't:
> $ runghc haskell-test.hs
> Logged in...
> Any help is appreciated =)
So, take this with a grain of salt: I've been bitten by curl (the
haskell bindings, I mean) before, and I don't hold the quality of the
library in great regard.
The libcurl documentation says: "When you set a file name with
CURLOPT_COOKIEJAR, that file name will be created and all received
cookies will be stored in it when curl_easy_cleanup(3) is called" (i.e.
at the end of a curl handle session). But even though the curl bindings
seem to run easy_cleanup on handles (initialize → mkCurl →
mkCurlWithCleanup), they don't do this correctly:
DEBUG: ALLOC: CURL
DEBUG: ALLOC: /tmp/network-curl-test-haskell20417.txt
DEBUG: ALLOC: username=foo&password=bar
DEBUG: ALLOC: http://michael.orlitzky.com/tmp/network-curl-test1.php
DEBUG: ALLOC: WRITER
DEBUG: ALLOC: WRITER
Note there's no "DEBUG: FREE: CURL" as the code seems to imply there
should be. Hence, the handle is never cleaned up (do the curl bindings
leak handles?), so the cookie file is never written.
Side note: by running the same program multiple times, sometimes you see
DEBUG: FREE: CURL, sometimes no FREE actions. I believe there's
something very wrong in the curl bindings with regard to cleanups.
If I modify curl to export a "force cleanup" function, I can make the
program work (but not always; my patch is a hack).
Alternatively, as the curl library doesn't need a cookie jar to use
cookies in the same handle, by modifying your code to reuse the same
curl handle (returning it from log_in and reusing the same in get_page)
gives me a success code. But the cookie file is still not filled, since
the curl handle is never properly terminated.
Since the curl bindings also have problems in multi-threaded programs
when SSL is enabled (as it doesn't actually setup the curl library
correctly with regards to multi-threaded memory allocation), I would
suggest you try to use the http conduit library, since that's a pure
haskell library that should work as well, if not better.
Happy to be proved wrong, if I'm just biased against curl :)
More information about the Haskell-Cafe