[Haskell-cafe] Haskell future and UTF8 vs UTF-16

Joachim Durchholz jo at durchholz.org
Sun Feb 11 21:55:14 UTC 2018


Am 11.02.2018 um 12:29 schrieb Merijn Verstraaten:
> On 11 Feb 2018, at 10:39, Alan & Kim Zimmerman <alan.zimm at gmail.com> wrote:
>> What is the current and future status of UTF8 vs UTF-16 in the haskell world?
>>
>> I understand that currently Text uses UTF-16, and it is used generally because of compatibility requirements in the Microsoft ecosystem, but that there are movements afoot to move to a UTF8 only environment at some unspecified future point.
> 
> As far as I know there was a UTF-8 fork of Text made as part of the Summer of Code a year or so ago, but it got ditched because it turned out to be slower than the UTF16 version in practice.
Mmm... correctness is another relevant point here.
Does Text handle characters beyond the Basic Multilingual Plane (U+00000 
to U+0FFFF) properly, do does one have to deal with "surrogate pairs" there?

I'm curious because I am seeing this kind of trouble in the Java world. 
The standard libraries there have pretty weak support for characters 
beyond 0x0FFFF, so most Java programmers pretend that these don't exist. 
I'm pretty sure Chinese users hate Java for that reason...

Regards,
Jo


More information about the Haskell-Cafe mailing list