[web-devel] XSS vs charset

Michael Snoyman michael at snoyman.com
Wed Apr 2 06:19:33 UTC 2014


On Wed, Apr 2, 2014 at 9:08 AM, Kazu Yamamoto <kazu at iij.ad.jp> wrote:

> Hi Michael,
>
> Thank you for your reply.
>
> > I suppose theoretically you could be talking about a situation where
> Mighty
> > is hosting a CGI application that receives user data and produces a
> static
> > HTML file as a result.
>
> Yes. Also I'm thinking about Yesod.
>
>
Yesod has more of a focus on dynamic content, and in those cases, we *do*
already set charset=utf8[1]. Where this would affect Yesod is in
yesod-static, in which case the same logic I've applied to Mighty would
apply: users should not be able to affect the content of static files under
normal circumstances, so the security concern is pretty remote.

[1]
https://github.com/yesodweb/yesod/blob/master/yesod-core/Yesod/Core/Content.hs#L161


> > But it
> > could be worked around by the CGI application using <meta charset=...>
> > instead.
>
> Yes. Is this rarely used in Yesod?
>
>
Yes. Dynamic responses don't normally go via static file serving at all. In
WAI terms, we always end up with a ResponseBuilder, not a ResponseFile, for
dynamic content.


> > So that comes to the question: is it safe for Mighty, mime-types, etc, to
> > require that all HTML files are stored as UTF-8? I'd say, as long as
> > there's a way for a user to override that if necessary, it sounds good to
> > me. mime-types does provide such a capability, so I'd be in favor of
> > tweaking its textual types to include explicit charset information.
>
> Probably I was too sensitive. Based on your discussion, it is
> safer/better for Mighty not to hard-code charset.
>
>
To be clear, besides the security concerns, there is *definitely* a
usability advantage in specifying charsets explicitly, in that the browser
doesn't need to use defaults or guessing[2]. This just comes down to a
numbers game: is it more likely that a browser will mis-guess the character
encoding of UTF8 data, or that someone running Mighty will provide non-UTF8
data?

One other point in the favor of specifying encoding type is that serving of
a file will *reliably* fail. Without a charset, some browsers may guess the
wrong character encoding while others won't, which makes it difficult to
debug. If you *always* serve with charset=utf8 and that turns out to be
wrong, you'll find out quickly and reliably.

[2] http://en.wikipedia.org/wiki/Charset_detection
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/web-devel/attachments/20140402/afba13a1/attachment.html>


More information about the web-devel mailing list