[web-devel] Type-safe URL handling

Michael Snoyman michael at snoyman.com
Fri Mar 19 18:55:19 EDT 2010


On Fri, Mar 19, 2010 at 2:41 PM, Jeremy Shaw <jeremy at n-heptane.com> wrote:

> On Fri, Mar 19, 2010 at 5:22 PM, Michael Snoyman <michael at snoyman.com>wrote:
>
>> I am not going to have time to look at this again until Saturday or
>>> Sunday. There are a few minor details that have been swept under the rug
>>> that need to be addressed. For example, when exactly does should url
>>> encoding / decoding take place. It's not good if that happens twice or not
>>> at all.
>>>
>>>
>>> Just to confuse the topic even more: if we do real URL encoding/decoding,
>> I believe we would have to assume a certain character set. I had to deal
>> with a site that was encoded in non-UTF8 just a bit ago, and dealing with
>> query parameters is not fun.
>>
>> That said, perhaps we should consider making the type of PathInfo
>> "PathInfo ByteString" so we make it clear that we're doing no character
>> encoding.
>>
>
> Yeah. I dunno. I just know it needs to be solved :)
>
>
>> Another issue in the same vein is dealing with leading and trailing
>> slashes, though I think this is fairly simple in practice: the web app knows
>> what to do about the trailing slashes, and each plugin should always pass a
>> leading slash.
>>
>
> I am not quite sure what you mean 'each plugin should always pass a leading
> slash'. Pass to whom?
>
> If we have:
>
> MySite = MyHome    | MyBlog Blog
> MyBlog = BlogHome | BlogPost String
>
> Then I would expect something like this:
>
> formatMySite MyHome = "MyHome"
> formatMySite (MyBlog blog) = "MyBlog/" ++ formatMyBlog blog
>
> formatMyBlog BlogHome = "BlogHome"
> formatMyBlog (BlogPost title) = "BlogPost/" ++ title
>
> mkAbs = ("http://localhost:3000/" ++)
>
> (ignoring any escaping  that needs to happen in title, and ignoring an
> AbsPath / PathInfo stuff).
>
> But we could, of course, do it the other way:
>
>
> formatMySite MyHome = "/MyHome"
> formatMySite (MyBlog blog) = "/MyBlog" ++ formatMyBlog blog
>
> formatMyBlog BlogHome = "/BlogHome"
> formatMyBlog (BlogPost title) = "/BlogPost/" ++ title
>
> mkAbs = ("http://localhost:3000" ++)
>
> There definitely needs to be some policy.
>
> - jeremy
>

Then here's a proposal for both issues at once:

* PathInfo is a ByteString
* handleWai strips the leading slash from the path-info
* every component parses and generates URLs without a leading slash.
Trailing slash is application's choice.

Regarding URL encoding, let me point out that the following are two
different URLs (just try clicking on them):

http://www.snoyman.com/blog/entry/persistent-plugs/
http://www.snoyman.com/blog/entry%2Fpersistent-plugs/<http://www.snoyman.com/blog/entry/persistent-plugs/>

In other words, if we ever URL-decode the string before it reaches the
application, we will have conflated unique URLs. I see two options here:

* We specify that PathInfo contains URL-encoded values. Any fromUrl/toUrl
functions must be aware of this fact.
* We change the type of PathInfo to [ByteString], where we split the
PathInfo by slashes, and specify that the pieces of *not* URL-encoded. In
order to preserve perfectly the original value, we should not combine
adjacent delimiters. In other words:

/foo/bar/baz/ -> ["foo", "bar", "baz", ""] -- note the trailing empty string
/foo/bar/baz -> ["foo", "bar", "baz"] -- we don't need a leading empty
string; *every* pathinfo begins with a slash
/foo%2Fbar/baz/ -> ["foo/bar", "baz", ""]
/foo//bar/baz -> ["foo", "", "bar", "baz]

I'm not strongly attached to any of this. Also, my original motivation for
breaking up the pieces (easier pattern matching) will be mitigated by the
usage of ByteStrings.

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/web-devel/attachments/20100319/58600319/attachment.html


More information about the web-devel mailing list