[Haskell-cafe] Contributing to http-conduit

Mon Jan 23 07:31:21 CET 2012

1. Oops - I overlooked the fact that the redirectCount attribute of a
Request is exported (it isn't listed on the
documentation<http://hackage.haskell.org/packages/archive/http-conduit/1.2.0/doc/html/Network-HTTP-Conduit.html>
probably
because the constructor itself isn't exported. This seems like a flaw in
Haddock...). Silly me. No need to export httpRaw.

2. I think that stuffing many arguments into the 'http' function is ugly.
However, I'm not sure that the number of arguments to 'http' could ever
reach an unreasonably large amount. Perhaps I have bad foresight, but I
personally feel that adding cookies to the http request will be the last
thing that we will need to add. Putting a bound on this growth of arguments
makes me more willing to think about this option. On the other hand, using
a BrowserAction to modify internal state is very elegant. Which approach do
you think is best? I think I'm leaning toward the upper-level Browser
module idea.

If there was to be a higher-level HTTP library, I would argue that the
redirection code should be moved into it, and the only high-level function
that the Network.HTTP.Conduit module would export is 'http' (or httpRaw).
What do you think about this?

Thanks for helping me out with this,
Myles C. Maxfield

On Sun, Jan 22, 2012 at 9:56 PM, Michael Snoyman <michael at snoyman.com>wrote:

> On Sun, Jan 22, 2012 at 11:07 PM, Myles C. Maxfield
> <myles.maxfield at gmail.com> wrote:
> > Replies are inline. Thanks for the quick and thoughtful response!
> >
> > On Sat, Jan 21, 2012 at 8:56 AM, Michael Snoyman <michael at snoyman.com>
> > wrote:
> >>
> >> Hi Myles,
> >>
> >> These sound like two solid features, and I'd be happy to merge in code
> to
> >> support it. Some comments below.
> >>
> >> On Sat, Jan 21, 2012 at 8:38 AM, Myles C. Maxfield
> >> <myles.maxfield at gmail.com> wrote:
> >>>
> >>> To: Michael Snoyman, author and maintainer of http-conduit
> >>> CC: haskell-cafe
> >>>
> >>> Hello!
> >>>
> >>> I am interested in contributing to the http-conduit library. I've been
> >>> using it for a little while and reading through its source, but have
> felt
> >>> that it could be improved with two features:
> >>>
> >>> Allowing the caller to know the final URL that ultimately resulted in
> the
> >>> HTTP Source. Because httpRaw is not exported, the caller can't even
> >>> re-implement the redirect-following code themselves. Ideally, the
> caller
> >>> would be able to know not only the final URL, but also the entire
> chain of
> >>> URLs that led to the final request. I was thinking that it would be
> even
> >>> cooler if the caller could be notified of these redirects as they
> happen in
> >>> another thread. There are a couple ways to implement this that I have
> been
> >>> thinking about:
> >>>
> >>> A straightforward way would be to add a [W.Ascii] to the type of
> >>> Response, and getResponse can fill in this extra field. getResponse
> already
> >>> knows about the Request so it can tell if the response should be
> gunzipped.
> >>
> >> What would be in the [W.Ascii], a list of all paths redirected to? Also,
> >> I'm not sure what gunzipping has to do with here, can you clarify?
> >>
> >
> > Yes; my idea was to make the [W.Ascii] represent the list of all URLs
> > redirected to, in order.
> >
> > My comment about gunzipping is only tangentially related. I meant that in
> > the latest version of the code on GitHub, the getResponse function
> already
> > takes a Request as an argument. This means that the getResponse function
> > already knows what URL its data is coming from, so modifying the
> getResponse
> > function to return that URL is simple. (I mentioned gunzip because, as
> far
> > as I can tell, the reason that getResponse already takes a Request is so
> > that the function can tell if the request should be gunzipped.)
> >>>
> >>> It would be nice for the caller to be able to know in real time what
> URLs
> >>> the request is being redirected to. A possible way to do this would be
> for
> >>> the 'http' function to take an extra argument of type (Maybe
> >>> (Control.Concurrent.Chan W.Ascii)) which httpRaw can push URLs into.
> If the
> >>> caller doesn't want to use this variable, they can simply pass Nothing.
> >>> Otherwise, the caller can create an IO thread which reads the Chan
> until
> >>> some termination condition is met (Perhaps this will change the type
> of the
> >>> extra argument to (Maybe (Chan (Maybe W.Ascii)))). I like this
> solution,
> >>> though I can see how it could be considered too heavyweight.
> >>
> >>
> >> I do think it's too heavyweight. I think if people really want
> lower-level
> >> control of the redirects, they should turn off automatic redirect and
> allow
> >> 3xx responses.
> >
> > Yeah, that totally makes more sense. As it stands, however, httpRaw isn't
> > exported, so a caller has no way of knowing about each individual HTTP
> > transaction. Exporting httpRaw solves the problem I'm trying to solve.
> If we
> > export httpRaw, should we also make 'http' return the URL chain? Doing
> both
> > is probably the best solution, IMHO.
>
> What's the difference between calling httpRaw and calling http with
> redirections turned off?
>
> >>>
> >>> Making the redirection aware of cookies. There are redirects around the
> >>> web where the first URL returns a Set-Cookie header and a 3xx code
> which
> >>> redirects to another site that expects the cookie that the first HTTP
> >>> transaction set. I propose to add an (IORef to a Data.Set of Cookies)
> to the
> >>> Manager datatype, letting the Manager act as a cookie store as well as
> a
> >>> repository of available TCP connections. httpRaw could deal with the
> cookie
> >>> store. Network.HTTP.Types does not declare a Cookie datatype, so I
> would
> >>> probably be adding one. I would probably take it directly from
> >>> Network.HTTP.Cookie.
> >>
> >> Actually, we already have the cookie package for this. I'm not sure if
> >> putting the cookie store in the manager is necessarily the right
> approach,
> >> since I can imagine wanting to have separate sessions while reusing the
> same
> >> connections. A different approach could be adding a list of Cookies to
> both
> >> the Request and Response.
> >
> > Ah, looks like you're the maintainer of that package as well! I didn't
> > realize it existed. I should have, though; Yesod must need to know about
> > cookies somehow.
> >
> > As the http-conduit package stands, the headers of the original Request
> can
> > be set, and the headers of the last Response can be read. Because cookies
> > are implemented on top of headers, the caller knows about the cookies
> before
> > and after the redirection chain. I'm more interested in the preservation
> of
> > cookies within the redirection chain. As discussed earlier, exposing the
> > httpRaw function allows the entire redirection chain to be handled by the
> > caller, which alleviates the problem.
> >
> > That being said, however, the simpleHttp function (and all functions
> built
> > upon 'http' inside of http-conduit) should probably respect cookies
> inside
> > redirection chains. Under the hood, Network.Browser does this by having
> the
> > State monad keep track of these cookies (as well as the connection pool)
> and
> > making HTTP requests mutate that State, but that's a pretty different
> > architecture than Network.HTTP.Conduit.
> >
> > One way I can think to do this would be to let the user supply a
> CookieStore
> > (probably implemented as a (Data.Set Web.Cookie.SetCookie)) and receive a
> > (different) CookieStore from the 'http' function. That way, the caller
> can
> > manage the CookieStores independently from the connection pool. The
> downside
> > is that it's one more bit of ugliness the caller has to deal with. How do
> > you feel about this? You probably have a better idea :-)
>
> The only idea was to implement an extra layer of cookie-away functions
> in a separate Browser module. That's been the running assumption for a
> while now, since HTTP does it, but I'm not opposed to taking a
> different approach.
>
> It could be that the big mistake in all this was putting redirection
> at the layer of the API that I did. Yitz Gale pointed out that in
> Python, they have the low-level API and the high-level API, the latter
> dealing with both redirection and cookies.
>
> Anyway, here's one possible approach to the whole situation: `Request`
> could have an extra record on it of type `Maybe (IORef (Set
> SetCookie))`. When `http` is called, if the record is `Nothing`, a new
> value is created. Every time a request is made, the value is updated
> accordingly. That way, redirects will respect cookies for the current
> sessions, and if you want to keep a longer-term session, you can keep
> reusing the record in different `Request`s. We can also add some
> convenience functions to automatically reuse the cookie set.
>
> Michael
>
> >> I'd be happy to do both of these things, but I'm hoping for your input
> on
> >> how to go about this endeavor. Are these features even good to be
> pursuing?
> >> Should I be going about this entirely differently?
> >>
> >> Thanks,
> >> Myles C. Maxfield
> >>
> >> P.S. I'm curious about the lack of Network.URI throughout
> >> Network.HTTP.Conduit. Is there a particular design decision that led
> you to
> >> use raw ascii strings?
> >
> >
> > Because there are plenty of URIs that are valid that we don't handle at
> all,
> > e.g., ftp.
> >
> > I'm a little surprised by this, since you can easily test for unhandled
> URIs
> > because they're already parsed. Whatever; It doesn't really matter to
> me, I
> > was just surprised by it.
> >
> > Michael
> >
> > Thanks again for the feedback! I'm hoping to make a difference :]
> >
> > --Myles
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120122/cb9f64cf/attachment.htm>