CWString

John Meacham john at repetae.net
Wed Aug 27 03:19:25 EDT 2003


On Wed, Aug 27, 2003 at 04:28:27AM +0100, Glynn Clements wrote:
> > Attached is a properly internationalized implementation of
> > Foreign.C.String, along with some other routines which I feel would be
> > very at home in the FFI standard.
> > 
> > Note that I am trying to solve a simpler problem than full generic i18n.
> > I just want the ability to work within the current locale, whatever it
> > might be.
> 
> But bear in mind that other programmers may want to work in the "C"
> locale, regardless of the user's environment settings. This why the C
> library doesn't attempt to use those settings unless the program
> explicitly requests their use via setlocale().
> 
> Also, some libraries may fail to cope with locales other than "C";
> particularly locales with multi-byte encodings.

then this is a library interface issue, not a locale one. 
If a library documents that an argument shall be nothing other than
ascii text then call withAsciiCString, if the library accepts localized
text then calal withCString (which is defined to return the localized
version by the ffi spec)

frankly, the ability to not call setlocale was a hack to work around
migration issues in C programs, there is no need for haskell programs to
inherit this complexity, if libraries need ascii strings, explicitly pass ascii
strings, if they need localized strings pass localized ones and so
forth. 

at worst, you can always do
LANG=C foo 
to force a program to run in a specific locale. (but if people are smart
about writing their C bindings, this sort of hack shouldn't ever be
necisarry)

> > also, to a lesser extent I propose we add explicit utf8 routines:
> > 
> >     withUTF8String, withUTF8StringLen, newUTF8String,
> >     newUTF8StringLen, peekUTF8String, peekUTF8StringLen
> > 
> > there are several libraries (X11 being a major one) which export an
> > explicit utf8 based interface,
> 
> Note that the Xutf8* functions are specific to XFree86's version of
> Xlib (and are only in 4.0.2 and later); they aren't in the vanilla
> OpenGroup version. They don't exist in vanilla X11R6, or in XFree86 3.x.
> 
> Also those functions are redundant; you can always use the Xmb*
> functions with a UTF8-based locale instead.

yeah, I was giving them as an example of a place where they would be
handy. locale hacks are bad! what if the system doesn't have a utf8
locale? in a multithreaded program temporarily changing the locale can
be disastorous. Xutf8* lets us avoid this very nicely when they exist
and we only need to fall back to less reliable locale tricks when
necisarry. 

This is similar to my previous comment, interfaces either use data in
the current locale or a specifified one only. haskell is of the later
type, specifying unicode as it's character set, which means that we can
talk to interfaces which are locale independent (like the Xutf8*) very
easily with statically determined charset conversions.  we should take
advantage of this ability whenever possible.


> Simon Marlow wrote:
> 
> > In our new implementation of Data.Char.isUpper and friends, I made the
> > simplifying assumption that Char==wchar_t==Unicode.  With glibc, this
> > appears to be valid as long as (a) you set LANG to something other than
> > "C" or "POSIX", and (b) you call setlocale() first.
> 
> The glibc Info file says:
> 
> 	The wide character character set always is UCS4, at least on
> 	GNU systems.
yes. with glibc, wchar_t is always unicode no matter what the locale.
better yet, all ISO C implementations  define a handy C symbol to test
for this. if __STDC_ISO_10646__ is defined then wchar_t is always
unicode no matter what.

> > We now call setlocale() in the RTS startup code.
> So anyone who doesn't want to use the current locale now has to
> explicitly set it back to "C"?
> 
> Also, this is just for LC_CTYPE, right?

if they want to use a different locale, they should change the
enviornment prior to running the program. if they want to use code which
only supports the C locale (meaning it only works with ascii) then call
withAsciiCString and friends...

        John

PS I made the implicit assumption that once *CString* was replaced by
localized versions we would export the old versions under
*AsciiCString* which just makes sense.

-- 
---------------------------------------------------------------------------
John Meacham - California Institute of Technology, Alum. - john at foo.net
---------------------------------------------------------------------------



More information about the FFI mailing list