CWString

Simon Marlow simonmar at microsoft.com
Wed Aug 27 04:58:08 EDT 2003


 
> > > In our new implementation of Data.Char.isUpper and 
> friends, I made the
> > > simplifying assumption that Char==wchar_t==Unicode.  With 
> glibc, this
> > > appears to be valid as long as (a) you set LANG to 
> something other than
> > > "C" or "POSIX", and (b) you call setlocale() first.
> > 
> > The glibc Info file says:
> > 
> > 	The wide character character set always is UCS4, at least on
> > 	GNU systems.
> yes. with glibc, wchar_t is always unicode no matter what the locale.
> better yet, all ISO C implementations  define a handy C symbol to test
> for this. if __STDC_ISO_10646__ is defined then wchar_t is always
> unicode no matter what.

Sure, but as I've been saying, the implementation of glibc doesn't do
this.  In the C or POSIX locale, the ctype macros only recognise ASCII.
Try it:

#include <wctype.h>
#include <stdio.h>
#include <locale.h>

main() {
    setlocale(LC_ALL,"");
    printf("%d\n", iswupper('A'));
    printf("%d\n", iswupper(0x391)); // Greek capital alpha
    printf("%d\n", iswupper(0x3B1)); // Greek small alpha
    printf("%d\n", iswlower(0x391)); // Greek capital alpha
    printf("%d\n", iswlower(0x3B1)); // Greek small alpha
}

$ LANG=en_GB ./a.out
1
1
0
0
1
$ LANG=C ./a.out
1
0
0
0
0

Should this be considered a bug in glibc?

Cheers,
	Simon




More information about the FFI mailing list