ANN: H98 FFI Addendum 1.0, Release Candidate 13

John Meacham john at repetae.net
Fri Oct 31 20:16:33 EST 2003


On Fri, Oct 31, 2003 at 12:32:55PM +0000, Ross Paterson wrote:
> On Fri, Oct 31, 2003 at 06:45:41PM +1100, Manuel M T Chakravarty wrote:
> > (2) The inclusion of John Meacham's `CWString' and
> >     `CLString' routines.  However, I propose to make the
> >     localised versions (aka `CLString') the default and
> >     rename the previous (8bit-based) routines to use a
> >     `CAString' suffix, where the `A' stands for ASCII.
> 
> Making the Right Thing the default, though it may cost more, seems
> appropriate.
> 
> In the sentence
> 
> 	The marshalling takes the current Unicode encoding on the
> 	Haskell side into account.
> 
> (which seems to have been there before), "current" seems wrong, since
> the Haskell side is constant.  How about something like
> 
> 	The marshalling converts each Haskell character, representing
> 	a Unicode code point, to one or more bytes in a manner
> 	determined by the current locale.
> 
> and dropping the later sentence about the locale.

This sounds good to me. I also might reword the paragraph introducing
the 8bit versions, as the efficiency reason for using them is less
important than the API one. meaning that some C APIs specify that a
localized string should be passed, while others explicitly don't use
localization and only expect ASCII (or another specific encoding such as
utf8) strings and this is most likely what will determine the choice of
string marshalers. 

> What happens if one attempts to convert a Char that has no encoding
> in the current locale?

my implementation converts unrepresentable characters to '?'. But
a case could be made for throwing a CharsetConversion exception of some
sort or simply eliding invalid characters. I am not sure what is best, I
chose the '?' route because it matches what happens when you don't have
a font installed and get a replacement character and is less troublesome
for the user.  however, it is a lossy transformation (the original
character code is lost) so perhaps an exception is better.

> It might be worth emphasizing that the Len is the number of bytes
> rather than Chars.
> 
> In the part about the single-byte versions, it might be worth tightening
> the warning to say that these preserve only the first 256 values of Char.
> (That is the Latin-1 subset, so calling them ASCII seems a misnomer.)

yeah, this sounds more precise to me too. 


-- 
---------------------------------------------------------------------------
John Meacham - California Institute of Technology, Alum. - john at foo.net
---------------------------------------------------------------------------


More information about the FFI mailing list