[Haskell-cafe] Haskell future and UTF8 vs UTF-16

Zemyla zemyla at gmail.com
Tue Feb 13 22:54:36 UTC 2018


I'd actually been thinking about whether it'd be worth it to include a
fingertree of character lengths in order to make length O(1) and
indexing, take, and drop O(log n). However, a Text is currently three
unpacked values, and putting something that can't be unboxed in there
may not be such a good idea.

On Sun, Feb 11, 2018 at 5:51 PM, Chris Wong <lambda.fairy at gmail.com> wrote:
> On Feb 12, 2018 10:57 AM, "Joachim Durchholz" <jo at durchholz.org> wrote:
>
> Am 11.02.2018 um 12:29 schrieb Merijn Verstraaten:
>>
>> On 11 Feb 2018, at 10:39, Alan & Kim Zimmerman <alan.zimm at gmail.com>
>> wrote:
>>>
>>> What is the current and future status of UTF8 vs UTF-16 in the haskell
>>> world?
>>>
>>> I understand that currently Text uses UTF-16, and it is used generally
>>> because of compatibility requirements in the Microsoft ecosystem, but that
>>> there are movements afoot to move to a UTF8 only environment at some
>>> unspecified future point.
>>
>>
>> As far as I know there was a UTF-8 fork of Text made as part of the Summer
>> of Code a year or so ago, but it got ditched because it turned out to be
>> slower than the UTF16 version in practice.
>
> Mmm... correctness is another relevant point here.
> Does Text handle characters beyond the Basic Multilingual Plane (U+00000 to
> U+0FFFF) properly, do does one have to deal with "surrogate pairs" there?
>
> I'm curious because I am seeing this kind of trouble in the Java world. The
> standard libraries there have pretty weak support for characters beyond
> 0x0FFFF, so most Java programmers pretend that these don't exist. I'm pretty
> sure Chinese users hate Java for that reason...
>
>
> IIRC, the public Text interface works with code points, not 16-bit units.
> Length and indexing are O(n) for this reason.
>
> So there should be no issues from a correctness point of view.
>
> Chris
>
> Regards,
> Jo
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.


More information about the Haskell-Cafe mailing list