[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.
Jonathan Cast
jonathanccast at fastmail.fm
Tue Oct 2 11:44:52 EDT 2007
On Tue, 2007-10-02 at 08:02 -0700, Deborah Goldsmith wrote:
> On Oct 2, 2007, at 5:11 AM, ChrisK wrote:
> > Deborah Goldsmith wrote:
> >
> >> UTF-16 is the native encoding used for Cocoa, Java, ICU, and
> >> Carbon, and
> >> is what appears in the APIs for all of them. UTF-16 is also what's
> >> stored in the volume catalog on Mac disks. UTF-8 is only used in BSD
> >> APIs for backward compatibility. It's also used in plain text
> >> files (or
> >> XML or HTML), again for compatibility.
> >>
> >> Deborah
> >
> >
> > On OS X, Cocoa and Carbon use Core Foundation, whose API does not
> > have a
> > one-true-encoding internally. Follow the rather long URL for details:
> >
> > http://developer.apple.com/documentation/CoreFoundation/Conceptual/
> > CFStrings/index.html?http://developer.apple.com/documentation/
> > CoreFoundation/Conceptual/CFStrings/Articles/StringStorage.html#//
> > apple_ref/doc/uid/20001179
> >
> > I would vote for an API that not just hides the internal store, but
> > allows
> > different internal stores to be used in a mostly compatible way.
> >
> > However, There is a UniChar typedef on OS X which is the same
> > unsigned 16 bit
> > integer as Java's JNI would use.
>
> UTF-16 is the type used in all the APIs. Everything else is
> considered an encoding conversion.
>
> CoreFoundation uses UTF-16 internally except when the string fits
> entirely in a single-byte legacy encoding like MacRoman or
> MacCyrillic. If any kind of Unicode processing needs to be done to
> the string, it is first coerced to UTF-16. If it weren't for
> backwards compatibility issues, I think we'd use UTF-16 all the time
> as the machinery for switching encodings adds complexity. I wouldn't
> advise it for a new library.
I would like to, again, strongly argue against sacrificing compatibility
with Linux/BSD/etc. for the sake of compatibility with OS X or Windows.
FFI bindings have to convert data formats in any case; Haskell shouldn't
gratuitously break Linux support (or make life harder on Linux) just to
support proprietary operating systems better.
Now, if /independent of the details of MacOS X/, UTF-16 is better
(objectively), it can be converted to anything by the FFI. But doing it
the way Java or MacOS X or Win32 or anyone else does it, at the expense
of Linux, I am strongly opposed to.
jcc
More information about the Haskell-Cafe
mailing list