[Haskell-cafe] PROPOSAL: Web application interface

Mon Jan 18 06:48:37 EST 2010

Mark, thanks for the response, it's very well thought out. Let me state two
things first to explain some of my design decisions.

Firstly, I'm shooting for lowest-common-denominator here. Right now, I see
that as the intersection between the CGI backend and a standalone server
backend; I think anything contained in both of those will be contained in
all other backends. If anyone has a contrary example, I'd be happy to see
it.

Secondly, the WAI is *not* designed to be "user friendly." It's designed to
be efficient and portable. People looking for a user-friendly way to write
applications should be using some kind of frontend, either a framework, or
something like hack-frontend-monadcgi.

That said, let's address your specific comments.

On Mon, Jan 18, 2010 at 8:54 AM, Mark Lentczner <markl at glyphic.com> wrote:

> I like this project! Thanks for resurrecting it!
>
> Some thoughts:
>
> Methods in HTTP are extensible. The type RequestMethod should probably have
> a "catchall" constructor
>        | Method B.ByteString
>
> Seems logical to me.

> Other systems (the WAI proposal on the Wiki, Hack, etc...) have broken the
> path into two parts: scriptName and pathInfo. While I'm not particularly
> fond of those names, they do break the path into "traversed" and
> "non-traversed" portions of the URL. This is very useful for achieving
> "location independence" of one's code. While this API is trying to stay
> agnostic to the web framework, some degree of traversal is pretty universal,
> and I think it would benefit being in here.
>
> Going to the standalone vs CGI example: in a CGI script, scriptName is a
well defined variable. However, it has absolutely no meaning to a standalone
handler. I think we're just feeding rubbish into the system. I'm also not
certain how one could *use* scriptName in any meaningful manner, outside of
trying to reconstruct a URL (more on this topic below).

> The fields serverPort, serverName, and urlScheme are typically only used by
> an application to "reconstruct" URLs for inclusion in the response. This is
> a constant source of bugs in many web sites. It is also a problem in
> creating modular web frameworks, since the application can't be unaware of
> its context (unless the server interprets and re-writes HTML and other
> content on the fly - which isn't realistic.) Perhaps a better solution would
> be to pass a "URL generating" function in the Request and hide all this. Of
> course, web frameworks *could* use these data to dispatch on "virtual host"
> like configurations. Though, perhaps that is the provenance of the server
> side of the this API? I don't have a concrete proposal here, just a gut that
> the inclusion of these breaks some amount of encapsulation we'd like to
> achieve for the Applications.
>
> I think it's impossible to ever reconstruct a URL for a CGI application.
I've tried it; once you start dealing with mod_rewrite, anything could
happen. Given that I think we should encourage users to make pretty URLs via
mod_rewrite, I oppose inserting such a function. When I need this kind of
information (many of my web apps do), I've put it in a configuration file.

However, I don't think it's a good idea to hide information that is
universal to all webapps. urlScheme in particular seems very important to
me; for example, maybe when serving an app over HTTPS you want to use a
secure static-file server as well. Frankly, I don't have a use case for
serverName and serverPort that don't involve reconstructing URLs, but my gut
feeling is better to leave it in the protocol in case it does have a use
case.

> The HTTP version information seems to have been dropped from Request. Alas,
> this is often needed when deciding what response headers to generate. I'm in
> favor of a simple data type for this:
>        data HttpVersion = Http09 | Http10 | Http11
>
> I had not thought of that at all, and I like it. However, do we want to
hard-code in all possible HTTP versions? In theory, there could be more
standards in the future. Plus, isn't Google currently working on a more
efficient approach to HTTP that would affect this?

> Using ByteString for all the non-body values I find awkward. Take headers,
> for example. The header names are going to come from a list of about 50 well
> known ones. It seems a shame that applications will be littered with
> expressions like:
>
>        [(B.pack "Content-Type", B.pack "text/html;charset=UTF-8")]
>
> Seems to me that it would be highly beneficial to include a module, say
> Network.WAI.Header, that defined these things:
>
>        [(Hdr.contentType, Hdr.mimeTextHtmlUtf8)]
>
> This approach would make WAI much more top-heavy and prone to becoming
out-of-date. I don't oppose having this module in a separate package, but I
want to keep WAI itself as lite as possible.

> Further, since non-fixed headers will be built up out of many little String
> bits, I'd just as soon have the packing and unpacking be done by the server
> side of this API, and let the applications deal with Strings for these
> little snippets both in the Request and the Response.
>
> As I stated at the beginning of this response, there should be a framework
or frontend sitting between WAI and the application. And given that the
actual data on the wire will be represented as a stream of bytes, I'd rather
stick with that.

For header names, in particular, it might be beneficial (and faster) to
> treat them like RequestMethod and make them a data type with nullary
> constructors for all 47 defined headers, and one ExtensionHeader String
> constructor.
>
> Same comment of top-heaviness.

> Finally, note that HTTP/1.1 actually does well define the character
> encoding of these parts of the protocol. It is a bit hard to find in the
> spec, but the request line, status line and headers are all transmitted in
> ISO-8859-1, (with some restrictions), with characters outside the set
> encoded as per RFC 2047 (MIME Message Header extensions). Mind you, I
> believe that most web servers *don't* do the 2047 decoding, and only either
> a) pass the strings as ISO-8859-1 strings, or decode that to native Unicode
> strings.
>
> Thanks for that information, I was unaware. However, I think it still makes
sense to keep WAI as low-level as possible, which would mean a sequence of
bytes.

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100118/b3322d74/attachment.html