[Haskell-cafe] Contributing to http-conduit

Mon Jan 23 06:56:39 CET 2012

On Sun, Jan 22, 2012 at 11:07 PM, Myles C. Maxfield
<myles.maxfield at gmail.com> wrote:
> Replies are inline. Thanks for the quick and thoughtful response!
>
> On Sat, Jan 21, 2012 at 8:56 AM, Michael Snoyman <michael at snoyman.com>
> wrote:
>>
>> Hi Myles,
>>
>> These sound like two solid features, and I'd be happy to merge in code to
>> support it. Some comments below.
>>
>> On Sat, Jan 21, 2012 at 8:38 AM, Myles C. Maxfield
>> <myles.maxfield at gmail.com> wrote:
>>>
>>> To: Michael Snoyman, author and maintainer of http-conduit
>>> CC: haskell-cafe
>>>
>>> Hello!
>>>
>>> I am interested in contributing to the http-conduit library. I've been
>>> using it for a little while and reading through its source, but have felt
>>> that it could be improved with two features:
>>>
>>> Allowing the caller to know the final URL that ultimately resulted in the
>>> HTTP Source. Because httpRaw is not exported, the caller can't even
>>> re-implement the redirect-following code themselves. Ideally, the caller
>>> would be able to know not only the final URL, but also the entire chain of
>>> URLs that led to the final request. I was thinking that it would be even
>>> cooler if the caller could be notified of these redirects as they happen in
>>> another thread. There are a couple ways to implement this that I have been
>>> thinking about:
>>>
>>> A straightforward way would be to add a [W.Ascii] to the type of
>>> Response, and getResponse can fill in this extra field. getResponse already
>>> knows about the Request so it can tell if the response should be gunzipped.
>>
>> What would be in the [W.Ascii], a list of all paths redirected to? Also,
>> I'm not sure what gunzipping has to do with here, can you clarify?
>>
>
> Yes; my idea was to make the [W.Ascii] represent the list of all URLs
> redirected to, in order.
>
> My comment about gunzipping is only tangentially related. I meant that in
> the latest version of the code on GitHub, the getResponse function already
> takes a Request as an argument. This means that the getResponse function
> already knows what URL its data is coming from, so modifying the getResponse
> function to return that URL is simple. (I mentioned gunzip because, as far
> as I can tell, the reason that getResponse already takes a Request is so
> that the function can tell if the request should be gunzipped.)
>>>
>>> It would be nice for the caller to be able to know in real time what URLs
>>> the request is being redirected to. A possible way to do this would be for
>>> the 'http' function to take an extra argument of type (Maybe
>>> (Control.Concurrent.Chan W.Ascii)) which httpRaw can push URLs into. If the
>>> caller doesn't want to use this variable, they can simply pass Nothing.
>>> Otherwise, the caller can create an IO thread which reads the Chan until
>>> some termination condition is met (Perhaps this will change the type of the
>>> extra argument to (Maybe (Chan (Maybe W.Ascii)))). I like this solution,
>>> though I can see how it could be considered too heavyweight.
>>
>>
>> I do think it's too heavyweight. I think if people really want lower-level
>> control of the redirects, they should turn off automatic redirect and allow
>> 3xx responses.
>
> Yeah, that totally makes more sense. As it stands, however, httpRaw isn't
> exported, so a caller has no way of knowing about each individual HTTP
> transaction. Exporting httpRaw solves the problem I'm trying to solve. If we
> export httpRaw, should we also make 'http' return the URL chain? Doing both
> is probably the best solution, IMHO.

What's the difference between calling httpRaw and calling http with
redirections turned off?

>>>
>>> Making the redirection aware of cookies. There are redirects around the
>>> web where the first URL returns a Set-Cookie header and a 3xx code which
>>> redirects to another site that expects the cookie that the first HTTP
>>> transaction set. I propose to add an (IORef to a Data.Set of Cookies) to the
>>> Manager datatype, letting the Manager act as a cookie store as well as a
>>> repository of available TCP connections. httpRaw could deal with the cookie
>>> store. Network.HTTP.Types does not declare a Cookie datatype, so I would
>>> probably be adding one. I would probably take it directly from
>>> Network.HTTP.Cookie.
>>
>> Actually, we already have the cookie package for this. I'm not sure if
>> putting the cookie store in the manager is necessarily the right approach,
>> since I can imagine wanting to have separate sessions while reusing the same
>> connections. A different approach could be adding a list of Cookies to both
>> the Request and Response.
>
> Ah, looks like you're the maintainer of that package as well! I didn't
> realize it existed. I should have, though; Yesod must need to know about
> cookies somehow.
>
> As the http-conduit package stands, the headers of the original Request can
> be set, and the headers of the last Response can be read. Because cookies
> are implemented on top of headers, the caller knows about the cookies before
> and after the redirection chain. I'm more interested in the preservation of
> cookies within the redirection chain. As discussed earlier, exposing the
> httpRaw function allows the entire redirection chain to be handled by the
> caller, which alleviates the problem.
>
> That being said, however, the simpleHttp function (and all functions built
> upon 'http' inside of http-conduit) should probably respect cookies inside
> redirection chains. Under the hood, Network.Browser does this by having the
> State monad keep track of these cookies (as well as the connection pool) and
> making HTTP requests mutate that State, but that's a pretty different
> architecture than Network.HTTP.Conduit.
>
> One way I can think to do this would be to let the user supply a CookieStore
> (probably implemented as a (Data.Set Web.Cookie.SetCookie)) and receive a
> (different) CookieStore from the 'http' function. That way, the caller can
> manage the CookieStores independently from the connection pool. The downside
> is that it's one more bit of ugliness the caller has to deal with. How do
> you feel about this? You probably have a better idea :-)

The only idea was to implement an extra layer of cookie-away functions
in a separate Browser module. That's been the running assumption for a
while now, since HTTP does it, but I'm not opposed to taking a
different approach.

It could be that the big mistake in all this was putting redirection
at the layer of the API that I did. Yitz Gale pointed out that in
Python, they have the low-level API and the high-level API, the latter
dealing with both redirection and cookies.

Anyway, here's one possible approach to the whole situation: `Request`
could have an extra record on it of type `Maybe (IORef (Set
SetCookie))`. When `http` is called, if the record is `Nothing`, a new
value is created. Every time a request is made, the value is updated
accordingly. That way, redirects will respect cookies for the current
sessions, and if you want to keep a longer-term session, you can keep
reusing the record in different `Request`s. We can also add some
convenience functions to automatically reuse the cookie set.

Michael

>> I'd be happy to do both of these things, but I'm hoping for your input on
>> how to go about this endeavor. Are these features even good to be pursuing?
>> Should I be going about this entirely differently?
>>
>> Thanks,
>> Myles C. Maxfield
>>
>> P.S. I'm curious about the lack of Network.URI throughout
>> Network.HTTP.Conduit. Is there a particular design decision that led you to
>> use raw ascii strings?
>
>
> Because there are plenty of URIs that are valid that we don't handle at all,
> e.g., ftp.
>
> I'm a little surprised by this, since you can easily test for unhandled URIs
> because they're already parsed. Whatever; It doesn't really matter to me, I
> was just surprised by it.
>
> Michael
>
> Thanks again for the feedback! I'm hoping to make a difference :]
>
> --Myles