ANN: H98 FFI Addendum 1.0, Release Candidate 15

Wed Nov 12 02:03:02 EST 2003

On Wed, Nov 12, 2003 at 04:57:09PM +1100, Manuel M T Chakravarty wrote:
> > The spec is silent on how exactly a Haskell Char is translated to a
> > CWchar, and there aren't any conversion functions ala castCharToCCHar /
> > castCCharToChar.
> 
> Hmm, should we maybe have a `castCharToCwchar' and `castCwcharToChar'?

I didn't include those in my sample implementation in case the
conversion was non-trivial, whether we want to support non-trivial
conversions is another matter. I would prefer not to add them unless we
want to specifically not support non-unicode wchar_t implementations.
an overly restrictive constraint IMHO.

> 
> > So presumably the expected behaviour is that the implementation does its
> > best to translate between Unicode Char and whatever encoding the
> > prevailing C library is using for wchar.  Any sensible implementation
> > will be using Unicode for wchar too, so the translation will be a simple
> > no-op, but the C standard doesn't specify this.  Older systems will
> > probably have a locale-dependent encoding for wchar.  The GNU C library
> > has a slight bug in this regard, too (see previous discussion).
> > 
> > I expect that when we implement the CWString operations for GHC we won't
> > bother with any locale-dependent translations, so the implementation
> > will only work on "sensible" systems.
> > 
> > There is a fair bit that is non-obvious here, so I feel the spec ought
> > to say something.
> 
> Yes, I agree.  The question is, what do we actually want for
> the standard?  Do we want to restrict the standard to only
> work for "sensible" systems?  If so, what is the proper
> phrase to identify "sensible" systems?

well, the defining factor of where my routines work is when wchar_t
represents unicode code points. this may or may not be locale dependent,
on glibc based systems, wchar_t is ALWAYS unicode and it #defines
something to that effect so this can be determined at compile time. on
other systems (like solaris, and I think the BSDs) it appears that
wchar_t is almost always unicode. a run-time test can determine if this
is the case so my routines will still work. (note: i have not
implemented this run-time test, but it shouldn't be difficult.)

note that there is no theoretical reason not to support odd encodings,
the iconv framework will let us convert directly from unicode to
whatever wchar_t and char encoding are appropriate. It may not be worth
implementing just yet due to the hassle and perhaps limited usefulness.
However I don't think the FFI spec should artificially constrain itself
based on the spurious limitations of current implementations.

        John

-- 
---------------------------------------------------------------------------
John Meacham - California Institute of Technology, Alum. - john at foo.net
---------------------------------------------------------------------------