[web-devel] Type-safe URL handling

Thu Mar 18 17:07:01 EDT 2010

On Wed, Mar 17, 2010 at 5:47 PM, Michael Snoyman <michael at snoyman.com>wrote:

>
> Now, as far as your concerns about boilerplate and hiding of types: you're
> correct on the small scale. When dealing with simple examples, it makes
> perfect sense to just pass in the 2 or 3 arguments directly instead of
> having a datatype declared. I see the advantage of having a unified
> typeclass/dispatch function for dealing with large, nested applications.
>

I can see how declaring a datatype (typically a record) can be useful when
you are passing a larger number of arguments to a subhandler. In fact, I
already have real code based on URLT where I do that. In the existing
example, I can call the version with the wrapped up arguments just fine with
out dispatch:

     run 3000 $ handleWaiU (mySiteD (SiteArgs (BlogArgs now))) "
http://localhost:3000"

If I call it using dispatch, then it is one token shorter:

     run 3000 $ handleWaiD (SiteArgs (BlogArgs now)) "http://localhost:3000"

except I also am forced to add all these tokens:

instance Dispatch SiteArgs where
  type Routes SiteArgs = SiteURL
  type App SiteArgs    = Application
  dispatch             = mySiteD

even though I am only going to call dispatch on SiteArgs one place in my
code.

So, without dispatch you get the option of using data-types to bundle up
arguments if you want to. I don't see how dispatch improves on that portion.

With dispatch you are forced to whether you want to or not. The reason you
are forced to is because dispatch requires a uniquely named type so it can
determine which function to call.

One advantage of Dispatch, is that you can write polymorphic functions that
call dispatch:

myFunc :: (Dispatch a) => a -> ...

Is that something we are likely to exploit?

> That said, your example and my example are not exactly the same. I find the
> final line of mine to be *much* more concise than your Dispatch version.
> Let's compare them directly:
>
> Mine:
>     run 3000 $ plugToWai (MySite $ Blog now) "http://localhost:3000/"
> Your dispatch version:
>      run 3000 $ handleWai mkAbs fromAbs (dispatch (SiteArgs (BlogArgs
> now)))
> Your handleWai version:
>      run 3000 $ handleWai mkAbs fromAbs (mySite now)
>

True. If I had a version of handleWai that uses AsURL (similar to how
plugToWai works). Then we have:

Yours:
    run 3000 $ plugToWai    (MySite $ Blog now) "http://localhost:3000/"
Mine (no dispatch):
    run 3000 $ handleWaiU (mySite now) "http://localhost:3000"
Mine (dispatch)
    run 3000 $ handleWaiD (MySite $ Blog now) "http://localhost:3000"

which are essentially the same. Without dispatch, mine could potentially be
one token longer. Though in this case it is one token shorter. The version
with dispatch is, of course, the same length.

I think a lot of the boilerplate you experienced comes from your
> implementation of my idea, not the idea itself.
>

I guess at this point I just feel like it is easier and more straightforward
to call the handlers by unique names than to create an instance of Dispatch
so I can call the handler using a general name. So, I am looking for some
compelling examples where I am going to benefit from having a function like,
dispatch :: (Dispatch a) => a -> (Routes a -> String) -> Routes a -> App a,
hanging around.

Though, as I also mentioned. I don't mind having the Dispatch class in the
library as long as I am not required to use it.

> However, let's try to deal with some of the other important issues.
> Firstly, Failing versus Maybe: I can't really see a case when you'd need to
> specify why the path is not a valid URL. It would seem that either it's a
> theoretically valid path, or it's not. Issues like "that object doesn't
> exist" wouldn't be handled at the dispatch level usually.
>

I have founding the Failing class to be very useful when using URLT for
implementing a REST API. The links within my Haskell app won't fail, but
links generated by non-Haskell clients can fail. For example, if some php
programmer accidentally tries to get, /mysite/myblog/foobar/bolg/1 -- they
are going to be a lot happier to see:

   expecting, 'blog', 'images', 'foo', but got 'bolg', than they would be if
they just got 'invalid url'. (Even better would be if it gave the character
offset to the bogus path component).

Also, if you are writing the toURL / fromURL functions by hand instead of
deriving them automatically somehow, then you are going to get it wrong
sometimes (in my experience, often). I provide a QuickCheck function that
can be used to ensure that your toURL / fromURL functions are inverses. But
when the test fails, it is nice to get a more specific error message.

I still think we need to reconsider relying on one or the other monad
> transformer library. I notice now that you're using mtl; Yesod uses
> transformers. I don't really have a strong preference on this, but it's
> immediately divisive.
>

I refactored so that it does not really depend on either now. I did this by
basically reimplementing URLT as a native Reader-like monad instead of
wrapping around ReaderT. I added URLT.MTL and URLT.Transformers which
contain the MonadTrans and MonadIO instances. But they are not used by any
of the code.

Happstack is currently mtl based. I think I like transformers better, though
I am saddened to see they do not have the classes like MonadReader,
MonadWriter, etc.

> There's one other major difference between URLT and my gist: my gist splits
> a path into pieces and hands that off for parsing. Your code allows each
> function to handle that itself. In your example, you use the default Read
> instance (I assume for simplicity). Splitting into pieces the way I did
> allowed for easy pattern matching; what would URLT code look like that
> handled "real" URLs?
>

I like the String over the [String] because it is the most general form of
representing a URL. If you wanted to use URLT to handle both the pathInfo
and the query string parameters, then [String] isn't really the correct
type. Though there could be something better than String as well...

As for handling, "real" URLs, there are a variety of solutions. If you don't
care too much about the prettiness of the URLs you can use template haskell
to generate AsURL instances:

$(deriveAsURL ''BlogURL)
$(deriveAsURL ''SiteURL)

main1b :: IO ()
main1b =
  do now <- getCurrentTime
     run 3000 $ handleWaiU (mySite now) "http://localhost:3000"

Or if you prefer Regular over TH you can do something like this (we can
probably be cleaned up a little):

$(deriveAll ''BlogURL "PFBlogURL")
type instance PF BlogURL = PFBlogURL

instance AsURL BlogURL where
  toURLS   = gtoURLS . from
  fromURLC = fmap (fmap to) gfromURLC

$(deriveAll ''SiteURL "PFSiteURL")
type instance PF SiteURL = PFSiteURL

instance AsURL SiteURL where
  toURLS   = gtoURLS . from
  fromURLC = fmap (fmap to) gfromURLC

that should also work with main1b.

Or you could do it without AsURL at all using syb:

gtoURL  :: (Data url) => url -> String
gfromURL :: (Data url) => String -> Failing url

     run 3000 $ handleWai gtoURL gfromURL (mySite now) "
http://localhost:3000"

Or you could add an AsURL instance that just called gtoURL / gfromURL, and
then you could use handleWaiU.

If you want to write parsers by hand, you could do it using parsec:

main1c :: IO ()
main1c =
  do now <- getCurrentTime
     run 3000 $ handleWai toSiteURL (fromURLP pSiteURL) (mySite now) "
http://localhost:3000"
       where
         pBlogURL :: Parser BlogURL
         pBlogURL =
           do char '/'
              (BlogPost <$> many1 (noneOf "/")) <|> pure BlogHome
         pSiteURL :: Parser SiteURL
         pSiteURL =
           do char '/'
              MyBlog <$> (string "blog" *> pBlogURL) <|> pure MyHome

         toBlogURL :: BlogURL -> String
         toBlogURL BlogHome         = ""
         toBlogURL (BlogPost title) = title

         toSiteURL :: SiteURL -> String
         toSiteURL MyHome           = ""
         toSiteURL (MyBlog blogURL) = "blog/" </> (toBlogURL blogURL)

In this example,  I call handleWai. But I could also create AsURL instances
and call handleWaiU.

Parsec is perhaps not the best choice of parser combinators. A more
specialized URL parser combinator library might be nice.

We could also add a helper function so that it is easier to do things via
straight pattern matching. But I think straight pattern patching may prove
tedious rather quickly?

In general though, I am not a big fan of writing the converters by hand,
because there is no assurance that they are inverses of each other, and it's
annoying to have to basically express the same structure twice -- once to
parse it, and once to print it.

But there does need to be someway where you can very explicitly map how the
datatype and string representation of the URL are related.

It would be much better if there was a DSL that simultaneously expressed how
to parse and how to print. I have not worked out how to do that yet though
-- it is somewhat tricky.

However, the quasiquote stuff looks potentially promising as a way of
expressing the parsing and printing in a single step...

- jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/web-devel/attachments/20100318/7ae9e23e/attachment-0001.html