Unicode, strings, and Show

Carter Schonwald carter.schonwald at gmail.com
Thu Mar 31 03:03:37 UTC 2016


One point in the design space that the swift language does, which seems
intersting at least to me, is to have the notion of a character be backed
by a Unicode grapheme cluster, which is a character like sequence of
Unicode code points.  Would library support for this at all help this
discussion or problem?

On Wednesday, March 30, 2016, Brandon Allbery <allbery.b at gmail.com> wrote:

> On Wed, Mar 30, 2016 at 9:50 PM, Manuel M T Chakravarty <
> chak at justtesting.org
> <javascript:_e(%7B%7D,'cvml','chak at justtesting.org');>> wrote:
>
>> Firstly, we have
>>
>>   isPrint :: Char -> Bool
>>
>> Are you saying that this type is wrong?
>>
>> Secondly, how often do you feed the output of ’show’ to ’read’ in another
>> locale versus how often is everybody whose whole life is outside of ASCII
>> (i.e., not anglo-centric people) bothered by this shortcoming? (*)
>>
>> Moreover, the argument on the ticket was that changing the current
>> implementation would go against the standard. Now that I am saying, the
>> current implementation is not conforming to the standard, the standard
>> suddenly doesn’t seem to matter. Personally, I would say, when we wrote
>> that standard, we knew what we were doing.
>>
>
> The standard I am aware of is the Report, which deliberately limited the
> output to the subset which is guaranteed to be usable in all locales. show
> conforms to this; apparently people want it to *not* conform, and in a way
> which requires some locale to become the One True Locale.
>
> isPrint is, as per the language Report, based on what Char is --- which is
> Unicode codepoints. Using it for output --- or for input, for that matter
> --- gets you into locale issues because nobody anywhere guarantees that
> Unicode codepoints that pass isPrint are representable in every locale.
> isPrint is not the place to verify that a character can actually be
> displayed in the current locale.
>
> Or have you decided that ghc should require Unicode locales and nothing
> but Unicode locales from now on? If so, what do you do when the next issue
> comes up, where Unix is UTF8 and Windows is UTF16?
>
> --
> brandon s allbery kf8nh                               sine nomine
> associates
> allbery.b at gmail.com <javascript:_e(%7B%7D,'cvml','allbery.b at gmail.com');>
>                                  ballbery at sinenomine.net
> <javascript:_e(%7B%7D,'cvml','ballbery at sinenomine.net');>
> unix, openafs, kerberos, infrastructure, xmonad
> http://sinenomine.net
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20160330/b3135b25/attachment.html>


More information about the ghc-devs mailing list