[Haskell-cafe] Re: String vs ByteString

Sun Aug 15 03:34:54 EDT 2010

>>>>> "Bryan" == Bryan O'Sullivan <bos at serpentine.com> writes:

    Bryan> On Sat, Aug 14, 2010 at 10:46 PM, Michael Snoyman <michael at snoyman.com> wrote:
    Bryan>     When I'm writing a web app, my code is sitting on a Linux
    Bryan> system where the default encoding is UTF-8, communicating
    Bryan> with a database speaking UTF-8, receiving request bodies in
    Bryan> UTF-8 and sending response bodies in UTF-8. So converting all
    Bryan> of that data to UTF-16, just to be converted right back to
    Bryan> UTF-8, does seem strange for that purpose.


    Bryan> Bear in mind that much of the data you're working with can't
    Bryan> be readily trusted. UTF-8 coming from the filesystem, the
    Bryan> network, and often the database may not be valid. The cost of
    Bryan> validating it isn't all that different from the cost of
    Bryan> converting it to UTF-16.

But UTF-16 (apart from being an abomination for creating a hole in the
codepoint space and making it impossible to ever etxend it) is slow to
process compared with UTF-32 - you can't get the nth character in
constant time, so it seems an odd choice to me.
-- 
Colin Adams
Preston Lancashire
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments