[Haskell-cafe] PROPOSAL: Web application interface

Sat Jan 23 14:31:47 EST 2010

Just as an update, I've made the following changes to my WAI git repo (
http://github.com/snoyberg/wai):

* I removed the RequestBody(Class) bits, and replaced them with "IO (Maybe
ByteString)". This is a good example of tradeoffs versus the enumerator
approach (see below).
* This might just be bikeshedding, but renamed RequestMethod to Method to
make names slightly shorter and more consistent.
* I implemented Mark's suggestions of adding support for arbitrary request
methods and information on HTTP version.

I've been having some off-list discussions about WAI, and have a few issues
to bring up. The first is relatively simple: what do we do about consuming
the entire request body? Do we leave that as a task to the application, or
should the server ensure that the entire request body is consumed?

Next, I have made the ResponseBodyClass typeclass specifically with the goal
of allowing optimizations for lazy bytestrings and sending files. The former
seems far-fetched; the latter provides the ability to use a sendfile system
call instead of copying the file data into memory. However, in the presence
of gzip encoding, how useful is this optimization?

Finally, there is a lot of discussion going on right now about enumerators.
The question is whether the WAI protocol should use them. There are two
places where they could replace the current offering: request body and
response body.

In my opinion, there is no major difference between the Hyena definition of
an enumerator and the current response body sendByteString method. The
former provides two extra features: there's an accumulating parameter passed
around, and a method for indicating early termination. However, the
accumulating parameter seems unnecesary to me in general, and when needed we
can accomplish the same result with MVars. Early termination seems like
something that would be unusual in the response context, and could be
handled with exceptions.

For the request body, there is a significant difference. However, I think
that the current approach (called imperative elsewhere) is more in line with
how most people would expect to program. At the same time, I believe there
is no performance issue going either way, and am open to community input.

Michael

On Mon, Jan 18, 2010 at 1:48 PM, Michael Snoyman <michael at snoyman.com>wrote:

> Mark, thanks for the response, it's very well thought out. Let me state two
> things first to explain some of my design decisions.
>
> Firstly, I'm shooting for lowest-common-denominator here. Right now, I see
> that as the intersection between the CGI backend and a standalone server
> backend; I think anything contained in both of those will be contained in
> all other backends. If anyone has a contrary example, I'd be happy to see
> it.
>
> Secondly, the WAI is *not* designed to be "user friendly." It's designed to
> be efficient and portable. People looking for a user-friendly way to write
> applications should be using some kind of frontend, either a framework, or
> something like hack-frontend-monadcgi.
>
> That said, let's address your specific comments.
>
>
> On Mon, Jan 18, 2010 at 8:54 AM, Mark Lentczner <markl at glyphic.com> wrote:
>
>> I like this project! Thanks for resurrecting it!
>>
>> Some thoughts:
>>
>> Methods in HTTP are extensible. The type RequestMethod should probably
>> have a "catchall" constructor
>>        | Method B.ByteString
>>
>> Seems logical to me.
>
>
>> Other systems (the WAI proposal on the Wiki, Hack, etc...) have broken the
>> path into two parts: scriptName and pathInfo. While I'm not particularly
>> fond of those names, they do break the path into "traversed" and
>> "non-traversed" portions of the URL. This is very useful for achieving
>> "location independence" of one's code. While this API is trying to stay
>> agnostic to the web framework, some degree of traversal is pretty universal,
>> and I think it would benefit being in here.
>>
>> Going to the standalone vs CGI example: in a CGI script, scriptName is a
> well defined variable. However, it has absolutely no meaning to a standalone
> handler. I think we're just feeding rubbish into the system. I'm also not
> certain how one could *use* scriptName in any meaningful manner, outside of
> trying to reconstruct a URL (more on this topic below).
>
>
>> The fields serverPort, serverName, and urlScheme are typically only used
>> by an application to "reconstruct" URLs for inclusion in the response. This
>> is a constant source of bugs in many web sites. It is also a problem in
>> creating modular web frameworks, since the application can't be unaware of
>> its context (unless the server interprets and re-writes HTML and other
>> content on the fly - which isn't realistic.) Perhaps a better solution would
>> be to pass a "URL generating" function in the Request and hide all this. Of
>> course, web frameworks *could* use these data to dispatch on "virtual host"
>> like configurations. Though, perhaps that is the provenance of the server
>> side of the this API? I don't have a concrete proposal here, just a gut that
>> the inclusion of these breaks some amount of encapsulation we'd like to
>> achieve for the Applications.
>>
>> I think it's impossible to ever reconstruct a URL for a CGI application.
> I've tried it; once you start dealing with mod_rewrite, anything could
> happen. Given that I think we should encourage users to make pretty URLs via
> mod_rewrite, I oppose inserting such a function. When I need this kind of
> information (many of my web apps do), I've put it in a configuration file.
>
> However, I don't think it's a good idea to hide information that is
> universal to all webapps. urlScheme in particular seems very important to
> me; for example, maybe when serving an app over HTTPS you want to use a
> secure static-file server as well. Frankly, I don't have a use case for
> serverName and serverPort that don't involve reconstructing URLs, but my gut
> feeling is better to leave it in the protocol in case it does have a use
> case.
>
>
>> The HTTP version information seems to have been dropped from Request.
>> Alas, this is often needed when deciding what response headers to generate.
>> I'm in favor of a simple data type for this:
>>        data HttpVersion = Http09 | Http10 | Http11
>>
>> I had not thought of that at all, and I like it. However, do we want to
> hard-code in all possible HTTP versions? In theory, there could be more
> standards in the future. Plus, isn't Google currently working on a more
> efficient approach to HTTP that would affect this?
>
>
>> Using ByteString for all the non-body values I find awkward. Take headers,
>> for example. The header names are going to come from a list of about 50 well
>> known ones. It seems a shame that applications will be littered with
>> expressions like:
>>
>>        [(B.pack "Content-Type", B.pack "text/html;charset=UTF-8")]
>>
>> Seems to me that it would be highly beneficial to include a module, say
>> Network.WAI.Header, that defined these things:
>>
>>        [(Hdr.contentType, Hdr.mimeTextHtmlUtf8)]
>>
>> This approach would make WAI much more top-heavy and prone to becoming
> out-of-date. I don't oppose having this module in a separate package, but I
> want to keep WAI itself as lite as possible.
>
>
>> Further, since non-fixed headers will be built up out of many little
>> String bits, I'd just as soon have the packing and unpacking be done by the
>> server side of this API, and let the applications deal with Strings for
>> these little snippets both in the Request and the Response.
>>
>> As I stated at the beginning of this response, there should be a framework
> or frontend sitting between WAI and the application. And given that the
> actual data on the wire will be represented as a stream of bytes, I'd rather
> stick with that.
>
> For header names, in particular, it might be beneficial (and faster) to
>> treat them like RequestMethod and make them a data type with nullary
>> constructors for all 47 defined headers, and one ExtensionHeader String
>> constructor.
>>
>> Same comment of top-heaviness.
>
>
>> Finally, note that HTTP/1.1 actually does well define the character
>> encoding of these parts of the protocol. It is a bit hard to find in the
>> spec, but the request line, status line and headers are all transmitted in
>> ISO-8859-1, (with some restrictions), with characters outside the set
>> encoded as per RFC 2047 (MIME Message Header extensions). Mind you, I
>> believe that most web servers *don't* do the 2047 decoding, and only either
>> a) pass the strings as ISO-8859-1 strings, or decode that to native Unicode
>> strings.
>>
>> Thanks for that information, I was unaware. However, I think it still
> makes sense to keep WAI as low-level as possible, which would mean a
> sequence of bytes.
>
> Michael
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100123/90f1e4bc/attachment.html