[Haskell-cafe] Re: Strings and utf-8
Jules Bean
jules at jellybean.co.uk
Thu Nov 29 08:05:19 EST 2007
Duncan Coutts wrote:
> On Wed, 2007-11-28 at 17:38 -0200, Maurício wrote:
>>>> (...) When it's phrased as "truncates to 8
>> >> bits" it sounds so simple, surely all we need
>> >> to do is not truncate to 8 bits right?
>> >>
>> >> The problem is, what encoding should it pick?
>> >> UTF8, 16, 32, EBDIC? (...)
>> >>
>> >> One sensible suggestion many people have made
>> >> is that H98 file IO should use the locale
>> >> encoding and do Unicode/String <-> locale
>> >> conversion. (...)
>>
>> I'm really afraid of solutions where the behavior
>> of your program changes with an environment
>> variable that not everybody has configured
>> properly, or even know to exist.
>
> Be afraid of all your standard Unix utils in that case. They are all
> locale dependent, not just for encoding but also for sorting order and
> the language of messages.
Language of messages is quite different from language of a file you read.
Suppose I am English, and I have a russian friend, Vlad.
My default locale is, say, latin-1, and his is something cyrillic.
I might well open files including my own files, and his files. The
locale of the current user is simple no guide to the correct encoding to
read a file in, and not a particularly reliable guide to writing a file out.
Locale makes perfect sense for messages (you are communicating with the
user, his locale tells you what language he speaks). It makes much less
sense for file IO.
Jules
More information about the Haskell-Cafe
mailing list