[Haskell-cafe] Re: String vs ByteString

Fri Aug 13 12:11:11 EDT 2010

2010/8/13 Bryan O'Sullivan <bos at serpentine.com>

> 2010/8/13 Gábor Lehel <illissius at gmail.com>
>
> How about the case for text which is guaranteed to be in ascii/latin1?
>> ByteString again?
>>
>
> If you know it's text and not binary data you are working with, you should
> still use Data.Text. There are a few good reasons.
>
>    1. The API is more correct. For instance, if you use Text.toUpper on a
>    string containing latin1 "ß" (eszett, sharp S), you'll get the
>    two-character sequence "SS", which is correct. Using Char8.map Char.toUpper
>    here gives the wrong answer.
>    2. In many cases, the API is easier to use, because it's oriented
>    towards using text data, instead of being a port of the list API.
>    3. Some commonly used functions, such as substring searching, are *way*faster than their ByteString counterparts.
>
> These are all good reasons. An even more important reason is type safety:

A function that receives a Text argument has the guaranteed that the input
is valid Unicode. A function that receives a ByteString doesn't have that
guarantee and if validity is important the function must perform a validity
check before operating on the data. If the function does not validate the
input the function might crash or, even worse, write invalid data to disk or
some other data store, corrupting the application data.

This is a bit of a subtle point that you really only see once systems get
large. Even though you might pay for the conversion from ByteString to Text
you might make up for that by avoiding several validity checks down the
road.

Cheers,
Johan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100813/bbed951f/attachment.html