CWString API

John Meacham john at repetae.net
Tue Nov 30 06:01:50 EST 2004


On Tue, Nov 30, 2004 at 02:40:20AM -0800, Krasimir Angelov wrote:
> --- John Meacham <john at repetae.net> wrote:
> > The problem is that these operations are very
> > unsafe, there is no
> > guarenteed isomorphism or even injection between
> > wchar_ts and Chars. If
> > people really know what they are doing, they can do
> > the conversion
> > themselves via fromIntegral/ord/chr, but I don't
> > think we should
> > encourage such unsafe usage with functions when it
> > is simple for the
> > user to work around it themselves. 
> 
> As I understand castCWcharToChar is unsafe if the
> language doesn't support unicode /* Char type is too
> small */ and castCharToCWchar is unsafe if in the
> target OS wchar_t has 16 bits while the language
> supports unicode. In both cases String<->CWString
> traslation is safe. When I have wchar_t in C then I
> have two opportunities:
> 
>   - map the type in Haskell to CWchar without any
> conversion
>   - use chr.fromIntegral or fromIntegral.ord
> 
> The first variant is more portable. Please correct me
> if I am wrong.
> 

The problem is that even if the language supports the full unicode
range, there is no guarentee that a single wchar_t maps (simply and in a
pure functional fashion) to a haskell Char. Just because wchar_t is 16
bits, it does not mean it represents a 16 bit subset of unicode,
regional systems may have specialized wchar_t's for their language
which are not unicode. The encoding of wchar_t is pretty much completely
unspecified, unless __STDC_ISO10646__ is defined, in which case it is
straight unicode and the casting routines could be defined simply (my
CWString library detects and optimizes this case.). The only common
system where this is the case is linux glibc based systems.

> Are castCCharToChar and castCharToCChar deprecated? I
> think castCharToCChar is unsafe when the language
> supports Unicode.

These have never really been safe to use. char may have a completly
different encoding than Char which these won't honor. deprecated may not
be the proper word, but whenever possible one should use the higher
level conversion routines which behave properly in the current locale.
These should only be used when you have system or application specific
knowledge that CChar is always ASCII and not dependent on the current
locale.  

Note that in general, there will not ever be a guarenteed one-to-one
mapping between chars,wchar_ts and haskell Chars, so higher level
routines must work on strings rather than individual chars. 

        John


-- 
John Meacham - ⑆repetae.net⑆john⑈ 


More information about the Glasgow-haskell-users mailing list