[Haskell-cafe] Re: Strings and utf-8

Jules Bean jules at jellybean.co.uk
Thu Nov 29 08:05:19 EST 2007


Duncan Coutts wrote:
> On Wed, 2007-11-28 at 17:38 -0200, Maurí­cio wrote:
>>>> (...)  When it's phrased as "truncates to 8
>>  >> bits" it sounds so simple, surely all we need
>>  >> to do is not truncate to 8 bits right?
>>  >>
>>  >> The problem is, what encoding should it pick?
>>  >> UTF8, 16, 32, EBDIC? (...)
>>  >>
>>  >> One sensible suggestion many people have made
>>  >> is that H98 file IO should use the locale
>>  >> encoding and do Unicode/String <-> locale
>>  >> conversion. (...)
>>
>> I'm really afraid of solutions where the behavior
>> of your program changes with an environment
>> variable that not everybody has configured
>> properly, or even know to exist.
> 
> Be afraid of all your standard Unix utils in that case. They are all
> locale dependent, not just for encoding but also for sorting order and
> the language of messages.

Language of messages is quite different from language of a file you read.

Suppose I am English, and I have a russian friend, Vlad.

My default locale is, say, latin-1, and his is something cyrillic.

I might well open files including my own files, and his files. The 
locale of the current user is simple no guide to the correct encoding to 
read a file in, and not a particularly reliable guide to writing a file out.

Locale makes perfect sense for messages (you are communicating with the 
user, his locale tells you what language he speaks). It makes much less 
sense for file IO.

Jules


More information about the Haskell-Cafe mailing list