CWString

Glynn Clements glynn.clements at virgin.net
Thu Aug 28 00:34:09 EDT 2003


John Meacham wrote:

> > > > > In our new implementation of Data.Char.isUpper and 
> > > friends, I made the
> > > > > simplifying assumption that Char==wchar_t==Unicode.  With 
> > > glibc, this
> > > > > appears to be valid as long as (a) you set LANG to 
> > > something other than
> > > > > "C" or "POSIX", and (b) you call setlocale() first.
> > > > 
> > > > The glibc Info file says:
> > > > 	The wide character character set always is UCS4, at least on
> > > > 	GNU systems.
> > > yes. with glibc, wchar_t is always unicode no matter what the locale.
> > > better yet, all ISO C implementations  define a handy C symbol to test
> > > for this. if __STDC_ISO_10646__ is defined then wchar_t is always
> > > unicode no matter what.
> > 
> > Sure, but as I've been saying, the implementation of glibc doesn't do
> > this.  In the C or POSIX locale, the ctype macros only recognise ASCII.
>  
> > Should this be considered a bug in glibc?
> 
> hmm.. how odd. I would consider it a bug, I think. I don't have a copy
> of the ISO spec handy but will be sure to look up whether that is
> conforming... It is certainly a malfeature if it is not a bug...

It certainly isn't a violation of ANSI/ISO C; that simply states that
"The behavior of these functions is affected by the LC_CTYPE category
of the current locale". It's perfectly legal for the implementation to
use different wide encodings depending upon the locale.

OTOH, it could be considered contrary to the statement in the glibc
documentation that "The wide character character set always is UCS4". 
glibc's wide-character encoding for the C locale may use the same
codepoints as UCS4, but glibc appears to treat everything above 255 as
"undefined" whereas UCS4 assigns an interpretation.

On my RH6.2 (glibc 2.1.3) system, Simon's test program suggests that
glibc exhibits the "ASCII" behaviour for any locale for which the file
/usr/share/locale/<locale>/LC_CTYPE doesn't exist (as well as the "C"
and "POSIX" locales, where it doesn't attempt to open an LC_CTYPE
file).

-- 
Glynn Clements <glynn.clements at virgin.net>



More information about the FFI mailing list