[web-devel] Type-safe URL handling

Fri Mar 26 11:30:37 EDT 2010

On Thu, Mar 25, 2010 at 4:25 PM, Jeremy Shaw <jeremy at n-heptane.com> wrote:

> On Thu, Mar 25, 2010 at 12:29 PM, Michael Snoyman <michael at snoyman.com>wrote:
>
>> OK, here are my initial code comments:
>>
>> * Do we want to move everything into Web.URLT? More to the point, I'm not
>> sure I see the point of calling this URLT, since it doesn't really require
>> any monad transformers; maybe we should call it web-routes and then the
>> module would be Web.Routes?
>>
>
> I think Web.Routes is a fine name. I'll make it happen. In the rest of this
> post I refer to things by the old names, but I do intend to change the
> module names and rename the package to web-routes.
>
>
>> * I like the PathInfo class and to/fromPathSegments. Perhaps we should
>> bundle that with the decode/encodePathInfo into a single module?
>>
>
> I put PathInfo in a separate module because I am a little dubious of
> classes these days. I find it a bit annoying that you can only have one
> PathInfo instance per type. And I think it helps show that using PathInfo is
> not actually required. But, in practice, I think having less modules is
> probably a good thing in this case, since it does not affect the dependency
> chain at all. Just because I *can* put every function in it's own module
> doesn't mean I should. ;) Also, we probably do want people to provide
> PathInfo instances, even if they don't have to..
>
> I also am beginning to share a mistrust of classes; I think I went a little
too overboard on them on a few previous packages (namely, convertible-text)
and am now having a reaction in the opposite direction. I'm sure one day
I'll find the Golden Path...

> * I'd like to minimize dependencies as much as possible for the basic
>> package. The two dependencies I've noticed are Consumer and
>> applicative-extras. I think the type signatures would be clearer *without*
>> those packages included, eg:
>>
>>    fromPathSegments :: [String] -> Either ErrMsg a
>>
>
> Except that is not a usable type. fromPathSegments may consume, some, but
> not all of the path segments. Consider the type:
>
> data SiteURL = Foo Int Int
>
> fromPathSegments is going to receive the path segments:
>
> ["Foo","1","2"]
>
> If you wrote a parser by hand, you would want it to look a little something
> like:
>
>  do string "Foo"
>       slash
>       i <- fromPathSegments
>       slash
>       j <- fromPathSegments
>      eol
>      return (Foo i j)
>
> The key concept here is that when you call fromPathSegments to get the
> first argument of Foo you need to know how many of the path segments were
> consumed / are remaining, so you can pass only those segments to the second
> fromPathSegments.
>
> So you really need a type like:
>
>    fromPathSegments :: [String] -> (Either ErrMsg a, [String])
>
> which outputs the unconsumed path segments.
>
> Well, given that as a criterion, I agree with the rest of your analysis
entirely. However, I think we're looking at the purpose of fromPathSegments
very differently. I'm not quite certain I understand why we would want to
output the unconsumed segments; if something is unconsumed, then it seems
like it's an invalid URL and should fail.

In your example, if I request "/Foo/5/6/7", fromPathSegments would return
(Right (Foo 5 6), ["7"]); but what is going to consume that 7 now? The use
case I envisioned for something like this is:

data BlogRoutes = ...
data MySite = MyHome | MyBlog BlogRoutes
fromPathSegments ("blog":rest) = MyBlog `fmap` fromPathSegments

> But this is obviously a ripe target for a monad of some sort -- trying keep
> track of the unconsumed portions by hand seems like it would asking for
> trouble...
>
> The Consumer monad takes care of that and provides the functions you would
> expect such as, next, peek, and poke. And it seems nice to be able to use
> Monad, MonadPlus, Applicative, Alternative, etc, for composing
> fromPathSegments into larger parsers ?
>
> But, perhaps there is a better choice of monad, or a better way of dealing
> with the problem? Or maybe it's not really a problem?
>
> I think Failing is a pretty nifty data-type for dealing with errors. But
> perhaps it is not a big win here.. The #1 thing that makes Failing better
> than (Either [String] a) is it's Applicative instance. Specifically, Failing
> will accumulate and return all the errors which have occurred, not the just
> first failure (which is the behavior for Applicative (Either e)).
>
> So for example, let's say you are doing are trying to lookup a bunch of
> keys from the query string. The key / value pairs in the query string are
> typically independent of each other. So let's say you do:
>
>  (,) <$> lookup "foo" <*> lookup "bar"
>
> but neither of those keys exist. With Either you will only get the error
> 'could not find "foo"'. But with Failing you will get the error 'could not
> find "foo". could not find "bar"'.  It is nice to get a report of all the
> things that are broken, instead of getting only one error at a time, fixing
> it, and then getting another error, etc.
>
> However, I am not sure if this property is all that useful which urlt. If
> you are trying to parse a url like:
>
>   (string "Foo" *> Foo) <$> fromPathSegments <*> fromPathSegments
>
> And the parsing of "Foo" fails.. then there is no use in finding out if the
> other segments parse ok -- because they are likely to be garbage. Maybe it
> failed because it got the string "FOo" instead of "Foo", but more likely it
> got something completely unrelated like, /bar/c/2.4.
>
> So, perhaps Either is a better choice even with out considering
> dependencies... I think that Applicative / Alternative instances for Either
> are only defined in transformers in the Control.Monad.Error module -- which
> is a bit annoying. But we don't actually need those to implement urlt
> itself.
>
> This brings up another detail though.
>
> the fromPathSegments / Consumer stuff is basically implementing a parser.
> Except, unlike something like parsec, we do not keep track of the current
> position for reporting errors. I wonder if we should perhaps use a slightly
> richer parser environment. Within a web app, once you got your to/from
> instances debugged, you will never get a parse error, so having great error
> messages is not essential. But, for other people linking to your site it
> could be potentially helpful. Though, it seems like the current error
> messages out to be sufficient given how short the urls are..
>
> I don't think fancy error reporting will help here. More to the point: we
could always layer a fancy parser on top of a simpler typeclass. For that
matter, the same argument can be made for Failing and Consumer.

> I'm not certain what exactly the type of ErrMsg should be here; I don't
>> really have a problem using [String], which would be close to the definition
>> of Failing.
>>
>> * I think it's very important to allow users to supply customized 404
>> pages. Essentially, we need to augment handleWai (possibly others) with a
>> (ErrMsg -> Application) parameter.
>>
>
> Yeah, there are (at least) two possibilities, add an extra param for the
> handler. Or bubble the error up to the top:
>
> handleWai_1 :: (url -> String) -> (String -> Failing url) -> String ->
> ([ErrorMsg] -> Application) -> ((url -> String) -> url -> Application) ->
> Application
> handleWai_1 fromUrl toUrl approot handleError handler =
>   \request ->
>      do let fUrl = toUrl $ stripOverlap approot $ S.unpack $ pathInfo
> request
>         case fUrl of
>           (Failure errs) -> handleError errs request
>           (Success url)  -> handler (showString approot . fromUrl) url
> request
>
> handleWai_2 :: (url -> String) -> (String -> Failing url) -> String ->
> ((url -> String) -> url -> Application) -> (Request -> IO (Failing
> Response))
> handleWai_2 fromUrl toUrl approot handler =
>   \request ->
>      do let fUrl = toUrl $ stripOverlap approot $ S.unpack $ pathInfo
> request
>         case fUrl of
>           (Failure errs) -> return (Failure errs)
>           (Success url)  -> fmap Success $ handler (showString approot .
> fromUrl) url request
>
> The second choice is perhaps more flexible. Which do you prefer? In the
> first option, the handleError function could be a Maybe value -- and if you
> supply Nothing you get some default 404 page?
>
> I personally prefer the first option exactly as you describe it, but you're
also correct that the second is more flexible. If anyone else reading this
thread would prefer the second, speak now or forever hold your peace ;).

> In happstack we have a third possiblity. The ServerMonad is an instance of
> MonadPlus so we can throw out the error message and just call mzero:
>
> implSite :: (Functor m, Monad m, MonadPlus m, ServerMonad m) => String ->
> FilePath -> Site url String (m a) -> m a
> implSite domain approot siteSpec =
>   do r <- implSite_ domain approot siteSpec
>      case r of
>        (Failure _) -> mzero
>        (Success a) -> return a
>
> implSite_ :: (Functor m, Monad m, MonadPlus m, ServerMonad m) => String ->
> FilePath -> Site url String (m a) -> m (Failing a)
> implSite_ domain approot siteSpec =
>     dirs approot $ do rq <- askRq
>                       let pathInfo = intercalate "/" (rqPaths rq)
>                           f        = runSite (domain ++ approot) siteSpec
> pathInfo
>                       case f of
>                         (Failure errs) -> return (Failure errs)
>                         (Success sp)   -> Success <$> (localRq (const $ rq
> { rqPaths = [] }) sp)
>
> then we can do:
>
>  msum [ implSite "domain" "approot" siteSpec
>             , default404
>             ]
>
> if implSite calls mzero, then the next handler (in this case default404) is
> tried.
>
>
>
>> * It might be nice to have "type WaiSite url = Site url String
>> Application". By the way, are you certain you want to allow parameterization
>> over the pathInfo type?
>>
>
> I'm not certain I don't want to allow it... I have a vague notion that I
> might want to use Text sometimes instead of String. Though if I was really
> committed to that then I should make toPathInfo and fromPathInfo
> parameterized over pathInfo as well... So perhaps I will axe it from Site
> for now. I need to change the name of that type and it's record names too I
> think.
>
>
Referring to the fear of typeclasses mentioned above: I'd like to avoid
MPTCs even more so. In fact, as I look at it, each extra parameter we add
creates more potential for incompatible components. For instance, I can see
an argument being made to use extensible exceptions for the fromPathSegments
return type, but I'd rather keep things standard with [String] than create
more division.

The only packages that I feel qualified to speak about then are urlt and
>> urlt-wai, and my recommendation would be:
>>
>> urlt contains decode/encodePathInfo, PathInfo class and related functions,
>> Site and related functions. If you agree on allowing the parameterization of
>> 404 errors, then also provide a default 404 error.
>>
>
>
>> urlt-wai contains WaiSite, handleWai and related functions.
>>
>
> Yeah, that is what I was thinking. urlt would contain what is currently in;
>
> URLT.Base
> URLT.PathInfo
> URLT.HandleT
> URLT.Monad
> URLT.QuickCheck
>
> QuickCheck module does not actually depend on QuickCheck, which is nice
> because QC1 vs QC2 is a big problem right now.
>
>
It might also be nice to include:
>
> URLT.TH
>
> with depends on template-haskell. But I am not sure that depending on
> template-haskell is an issue because template-haskell comes with ghc6, and
> the code in URLT.TH already handles the breakage that happened with TH
> 2.4.
>
> I have a different motive for keeping the TH code out: it seems like all of
the other pieces of code should be relatively stable from early on, while
the TH code (and quasi-quoting, and regular) will probably have some major
changes happening for a while. It would be nice to have a consistent major
release number for long periods of time on the core.

> If I switch to Either instead of Failing I believe the dependencies would
> be:
>
>  base, Consumer, template-haskell, network, utf8-string
>
> urlt-wai would just include:
>
> URLT.Wai
>
> Sounds great. Let me know when this is available for review. If you want me
to do any of the merging/renaming, I have some time now (I arrived in
southern California at 3:30 in the morning...).

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/web-devel/attachments/20100326/254ea8a9/attachment-0001.html