[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.
Deborah Goldsmith
dgoldsmith at mac.com
Tue Oct 2 11:02:30 EDT 2007
On Oct 2, 2007, at 5:11 AM, ChrisK wrote:
> Deborah Goldsmith wrote:
>
>> UTF-16 is the native encoding used for Cocoa, Java, ICU, and
>> Carbon, and
>> is what appears in the APIs for all of them. UTF-16 is also what's
>> stored in the volume catalog on Mac disks. UTF-8 is only used in BSD
>> APIs for backward compatibility. It's also used in plain text
>> files (or
>> XML or HTML), again for compatibility.
>>
>> Deborah
>
>
> On OS X, Cocoa and Carbon use Core Foundation, whose API does not
> have a
> one-true-encoding internally. Follow the rather long URL for details:
>
> http://developer.apple.com/documentation/CoreFoundation/Conceptual/
> CFStrings/index.html?http://developer.apple.com/documentation/
> CoreFoundation/Conceptual/CFStrings/Articles/StringStorage.html#//
> apple_ref/doc/uid/20001179
>
> I would vote for an API that not just hides the internal store, but
> allows
> different internal stores to be used in a mostly compatible way.
>
> However, There is a UniChar typedef on OS X which is the same
> unsigned 16 bit
> integer as Java's JNI would use.
UTF-16 is the type used in all the APIs. Everything else is
considered an encoding conversion.
CoreFoundation uses UTF-16 internally except when the string fits
entirely in a single-byte legacy encoding like MacRoman or
MacCyrillic. If any kind of Unicode processing needs to be done to
the string, it is first coerced to UTF-16. If it weren't for
backwards compatibility issues, I think we'd use UTF-16 all the time
as the machinery for switching encodings adds complexity. I wouldn't
advise it for a new library.
Deborah
More information about the Haskell-Cafe
mailing list