[web-devel] Type-safe URL handling

Thu Mar 18 18:17:23 EDT 2010

On Thu, Mar 18, 2010 at 2:07 PM, Jeremy Shaw <jeremy at n-heptane.com> wrote:

> On Wed, Mar 17, 2010 at 5:47 PM, Michael Snoyman <michael at snoyman.com>wrote:
>
>>
>> Now, as far as your concerns about boilerplate and hiding of types: you're
>> correct on the small scale. When dealing with simple examples, it makes
>> perfect sense to just pass in the 2 or 3 arguments directly instead of
>> having a datatype declared. I see the advantage of having a unified
>> typeclass/dispatch function for dealing with large, nested applications.
>>
>
> I can see how declaring a datatype (typically a record) can be useful when
> you are passing a larger number of arguments to a subhandler. In fact, I
> already have real code based on URLT where I do that. In the existing
> example, I can call the version with the wrapped up arguments just fine with
> out dispatch:
>
>      run 3000 $ handleWaiU (mySiteD (SiteArgs (BlogArgs now))) "
> http://localhost:3000"
>
> If I call it using dispatch, then it is one token shorter:
>
>      run 3000 $ handleWaiD (SiteArgs (BlogArgs now)) "
> http://localhost:3000"
>
> except I also am forced to add all these tokens:
>
> instance Dispatch SiteArgs where
>   type Routes SiteArgs = SiteURL
>   type App SiteArgs    = Application
>   dispatch             = mySiteD
>
> even though I am only going to call dispatch on SiteArgs one place in my
> code.
>
> So, without dispatch you get the option of using data-types to bundle up
> arguments if you want to. I don't see how dispatch improves on that portion.
>
> With dispatch you are forced to whether you want to or not. The reason you
> are forced to is because dispatch requires a uniquely named type so it can
> determine which function to call.
>
> One advantage of Dispatch, is that you can write polymorphic functions that
> call dispatch:
>
> myFunc :: (Dispatch a) => a -> ...
>
> Is that something we are likely to exploit?
>
>
>> That said, your example and my example are not exactly the same. I find
>> the final line of mine to be *much* more concise than your Dispatch version.
>> Let's compare them directly:
>>
>> Mine:
>>     run 3000 $ plugToWai (MySite $ Blog now) "http://localhost:3000/"
>> Your dispatch version:
>>      run 3000 $ handleWai mkAbs fromAbs (dispatch (SiteArgs (BlogArgs
>> now)))
>> Your handleWai version:
>>      run 3000 $ handleWai mkAbs fromAbs (mySite now)
>>
>
>
> True. If I had a version of handleWai that uses AsURL (similar to how
> plugToWai works). Then we have:
>
> Yours:
>     run 3000 $ plugToWai    (MySite $ Blog now) "http://localhost:3000/"
> Mine (no dispatch):
>      run 3000 $ handleWaiU (mySite now) "http://localhost:3000"
> Mine (dispatch)
>     run 3000 $ handleWaiD (MySite $ Blog now) "http://localhost:3000"
>
> which are essentially the same. Without dispatch, mine could potentially be
> one token longer. Though in this case it is one token shorter. The version
> with dispatch is, of course, the same length.
>
> I think a lot of the boilerplate you experienced comes from your
>> implementation of my idea, not the idea itself.
>>
>
> I guess at this point I just feel like it is easier and more
> straightforward to call the handlers by unique names than to create an
> instance of Dispatch so I can call the handler using a general name. So, I
> am looking for some compelling examples where I am going to benefit from
> having a function like, dispatch :: (Dispatch a) => a -> (Routes a ->
> String) -> Routes a -> App a, hanging around.
>
> Though, as I also mentioned. I don't mind having the Dispatch class in the
> library as long as I am not required to use it.
>
>
Based on everything you've said, and some thought I've had on my own, I
agree that the base function should involve no typeclasses and not break up
the path into pieces. Here's a proposal for the entire core:

newtype AbsPath = AbsPath { unAbsPath :: String }
newtype PathInfo = PathInfo { unPathInfo :: String }
handleWai :: (PathInfo -> Failing url)
          -> (url -> PathInfo)
          -> (PathInfo -> AbsPath)
          -> (url -> (url -> AbsPath) -> Application)
          -> Application
handleWai parsePI buildPI buildAbsPath dispatch req = do
    let pi = PathInfo $ S.unpack $ pathInfo req
    case parsePI pi of
        Success url -> dispatch url (buildAbsPath . buildPI) req
        Failure errors -> return $ Response Status404 [] $ Right $ fromLBS
                                 $ L.pack $ unlines errors

I've gone ahead and gotten my previous plugToWai function to work on top of
this (available in the gist), which should be enough of a proof-of-concept
that this core is solid enough. I think it makes a lot of sense to define
the two newtypes to keep a clear distinction between the two categories of
"URLs".

We could augment this further with a "[String] -> IO Response" failure
handling function. If we *really* want to go overboard, we could even
redefine it as this:

handleWai :: (PathInfo -> Either err url)
          -> (err -> Application)
          -> (url -> PathInfo)
          -> (PathInfo -> AbsPath)
          -> (url -> (url -> AbsPath) -> Application)
          -> Application

 However, let's try to deal with some of the other important issues.
>> Firstly, Failing versus Maybe: I can't really see a case when you'd need to
>> specify why the path is not a valid URL. It would seem that either it's a
>> theoretically valid path, or it's not. Issues like "that object doesn't
>> exist" wouldn't be handled at the dispatch level usually.
>>
>
> I have founding the Failing class to be very useful when using URLT for
> implementing a REST API. The links within my Haskell app won't fail, but
> links generated by non-Haskell clients can fail. For example, if some php
> programmer accidentally tries to get, /mysite/myblog/foobar/bolg/1 -- they
> are going to be a lot happier to see:
>
>    expecting, 'blog', 'images', 'foo', but got 'bolg', than they would be
> if they just got 'invalid url'. (Even better would be if it gave the
> character offset to the bogus path component).
>
> Also, if you are writing the toURL / fromURL functions by hand instead of
> deriving them automatically somehow, then you are going to get it wrong
> sometimes (in my experience, often). I provide a QuickCheck function that
> can be used to ensure that your toURL / fromURL functions are inverses. But
> when the test fails, it is nice to get a more specific error message.
>
> I still think we need to reconsider relying on one or the other monad
>> transformer library. I notice now that you're using mtl; Yesod uses
>> transformers. I don't really have a strong preference on this, but it's
>> immediately divisive.
>>
>
> I refactored so that it does not really depend on either now. I did this by
> basically reimplementing URLT as a native Reader-like monad instead of
> wrapping around ReaderT. I added URLT.MTL and URLT.Transformers which
> contain the MonadTrans and MonadIO instances. But they are not used by any
> of the code.
>
> Happstack is currently mtl based. I think I like transformers better,
> though I am saddened to see they do not have the classes like MonadReader,
> MonadWriter, etc.
>
>
I see that Gregory already responded on monads-fd and monads-tf. Which only
further splits the community unfortunately.

> There's one other major difference between URLT and my gist: my gist splits
>> a path into pieces and hands that off for parsing. Your code allows each
>> function to handle that itself. In your example, you use the default Read
>> instance (I assume for simplicity). Splitting into pieces the way I did
>> allowed for easy pattern matching; what would URLT code look like that
>> handled "real" URLs?
>>
>
> I like the String over the [String] because it is the most general form of
> representing a URL. If you wanted to use URLT to handle both the pathInfo
> and the query string parameters, then [String] isn't really the correct
> type. Though there could be something better than String as well...
>
> In some ways, ByteString is more appropriate, since that *is* the actual
data available. But I doubt this really makes much of a difference,
especially if we just internally use Char8 unpacking.

> As for handling, "real" URLs, there are a variety of solutions. If you
> don't care too much about the prettiness of the URLs you can use template
> haskell to generate AsURL instances:
>
> $(deriveAsURL ''BlogURL)
> $(deriveAsURL ''SiteURL)
>
> main1b :: IO ()
> main1b =
>   do now <- getCurrentTime
>      run 3000 $ handleWaiU (mySite now) "http://localhost:3000"
>
> Or if you prefer Regular over TH you can do something like this (we can
> probably be cleaned up a little):
>
> $(deriveAll ''BlogURL "PFBlogURL")
> type instance PF BlogURL = PFBlogURL
>
> instance AsURL BlogURL where
>   toURLS   = gtoURLS . from
>   fromURLC = fmap (fmap to) gfromURLC
>
> $(deriveAll ''SiteURL "PFSiteURL")
> type instance PF SiteURL = PFSiteURL
>
> instance AsURL SiteURL where
>   toURLS   = gtoURLS . from
>   fromURLC = fmap (fmap to) gfromURLC
>
> that should also work with main1b.
>
> Or you could do it without AsURL at all using syb:
>
> gtoURL  :: (Data url) => url -> String
> gfromURL :: (Data url) => String -> Failing url
>
>      run 3000 $ handleWai gtoURL gfromURL (mySite now) "
> http://localhost:3000"
>
> Or you could add an AsURL instance that just called gtoURL / gfromURL, and
> then you could use handleWaiU.
>
> If you want to write parsers by hand, you could do it using parsec:
>
> main1c :: IO ()
> main1c =
>   do now <- getCurrentTime
>      run 3000 $ handleWai toSiteURL (fromURLP pSiteURL) (mySite now) "
> http://localhost:3000"
>        where
>          pBlogURL :: Parser BlogURL
>          pBlogURL =
>            do char '/'
>               (BlogPost <$> many1 (noneOf "/")) <|> pure BlogHome
>          pSiteURL :: Parser SiteURL
>          pSiteURL =
>            do char '/'
>               MyBlog <$> (string "blog" *> pBlogURL) <|> pure MyHome
>
>          toBlogURL :: BlogURL -> String
>          toBlogURL BlogHome         = ""
>          toBlogURL (BlogPost title) = title
>
>          toSiteURL :: SiteURL -> String
>          toSiteURL MyHome           = ""
>          toSiteURL (MyBlog blogURL) = "blog/" </> (toBlogURL blogURL)
>
> In this example,  I call handleWai. But I could also create AsURL instances
> and call handleWaiU.
>
> Parsec is perhaps not the best choice of parser combinators. A more
> specialized URL parser combinator library might be nice.
>
> We could also add a helper function so that it is easier to do things via
> straight pattern matching. But I think straight pattern patching may prove
> tedious rather quickly?
>
> In general though, I am not a big fan of writing the converters by hand,
> because there is no assurance that they are inverses of each other, and it's
> annoying to have to basically express the same structure twice -- once to
> parse it, and once to print it.
>
> But there does need to be someway where you can very explicitly map how the
> datatype and string representation of the URL are related.
>
> It would be much better if there was a DSL that simultaneously expressed
> how to parse and how to print. I have not worked out how to do that yet
> though -- it is somewhat tricky.
>
> However, the quasiquote stuff looks potentially promising as a way of
> expressing the parsing and printing in a single step...
>
> - jeremy
>

I'm glad to hear someone else finds writing the same data twice to be
error-prone and redundant. If we get this core out there, I'll happily split
my mkResources quasi-quoter from Yesod and make it available as a standalone
package.

By the way, should we think of something more descriptive than urlt?

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/web-devel/attachments/20100318/d1aa373d/attachment-0001.html