simonmar at microsoft.com
Wed Aug 27 04:58:08 EDT 2003
> > > In our new implementation of Data.Char.isUpper and
> friends, I made the
> > > simplifying assumption that Char==wchar_t==Unicode. With
> glibc, this
> > > appears to be valid as long as (a) you set LANG to
> something other than
> > > "C" or "POSIX", and (b) you call setlocale() first.
> > The glibc Info file says:
> > The wide character character set always is UCS4, at least on
> > GNU systems.
> yes. with glibc, wchar_t is always unicode no matter what the locale.
> better yet, all ISO C implementations define a handy C symbol to test
> for this. if __STDC_ISO_10646__ is defined then wchar_t is always
> unicode no matter what.
Sure, but as I've been saying, the implementation of glibc doesn't do
this. In the C or POSIX locale, the ctype macros only recognise ASCII.
printf("%d\n", iswupper(0x391)); // Greek capital alpha
printf("%d\n", iswupper(0x3B1)); // Greek small alpha
printf("%d\n", iswlower(0x391)); // Greek capital alpha
printf("%d\n", iswlower(0x3B1)); // Greek small alpha
$ LANG=en_GB ./a.out
$ LANG=C ./a.out
Should this be considered a bug in glibc?
More information about the FFI