Unicode, strings, and Show
Manuel M T Chakravarty
chak at justtesting.org
Thu Mar 31 06:38:25 UTC 2016
> Brandon Allbery <allbery.b at gmail.com>:
> On Wed, Mar 30, 2016 at 9:50 PM, Manuel M T Chakravarty <chak at justtesting.org <mailto:chak at justtesting.org>> wrote:
> Firstly, we have
> isPrint :: Char -> Bool
> Are you saying that this type is wrong?
> Secondly, how often do you feed the output of ’show’ to ’read’ in another locale versus how often is everybody whose whole life is outside of ASCII (i.e., not anglo-centric people) bothered by this shortcoming? (*)
> Moreover, the argument on the ticket was that changing the current implementation would go against the standard. Now that I am saying, the current implementation is not conforming to the standard, the standard suddenly doesn’t seem to matter. Personally, I would say, when we wrote that standard, we knew what we were doing.
> The standard I am aware of is the Report, which deliberately limited the output to the subset which is guaranteed to be usable in all locales. show conforms to this; apparently people want it to *not* conform, and in a way which requires some locale to become the One True Locale.
Where does it say that in the Report?
> isPrint is, as per the language Report, based on what Char is --- which is Unicode codepoints. Using it for output — or for input, for that matter --- gets you into locale issues because nobody anywhere guarantees that Unicode codepoints that pass isPrint are representable in every locale. isPrint is not the place to verify that a character can actually be displayed in the current locale.
Yet, this is apparently what the report requires.
IMHO, it also makes sense. We have seen that either set up (the current or using ’isPrint’) has imperfections. However, getting \<number> is rarely helpful, whereas using ’isPrint’ is going to be helpful most of the time.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ghc-devs