Let's get this finished

Mon Jan 8 09:02:00 EST 2001

qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) wrote,

> Mon, 08 Jan 2001 14:55:31 +1100, Manuel M. T. Chakravarty <chak at cse.unsw.edu.au> pisze:
> 
> > How about having an interface where the String marshalling
> > functions take an additional argument
> > 
> >   data CConv = NoCConv                  -- handle as 8bit chars
> >            | StdCConc                   -- standard conversion
> >            | CustomCConv String         -- special conversions
> 
> There are already string marshalling variants which take the conversion
> as the argument. The first is equivalent to toLatin1 / fromLatin1,
> the second is localOut / localIn or using functions with no explicit
> conversion (names are of source subject to discussion), the third would
> either require building a database of name -> conversion mappings or
> just pass names to iconv, assuming iconv knows that conversion.

Where are these variants?  In QForeign?  The interface for
CString that was under discussion here didn't say anything
about conversions?

> They make no sense when there is never any conversion, so I haven't
> talked about them yet when discussing MarshalCString or equivalents.

Fair enough.  So, basically what I am saying, then, is that
as long as it is not clear how the conversion interface
looks in detail, let's talk about two different conversions
that we definitely know we are going to need: the
toLatin1/fromLatin1 conversion and the localOut/localIn
conversion.  I want the standard FFI CString interface to
support these two.  The rest we can add later, but I don't
want to be restricted to the std conversion only.

BTW, how efficient is the code for toLatin1/fromLatin1?

> If you know that data is always ASCII, you can use toLatin1 as the
> conversion.

With the interface that we discussed so far, I can't.

> > Then, it is up to the programmer to decide whether to use
> > conversion.
> 
> There is always a conversion. Char and CChar are not the same type.
> But it can be as trivial as toLatin1.

Ok, with "whether there is a conversion" I meant "whether
there is more than castCharToCChar lifted to Strings".

> > The idea of the last variant would be that in your conversion
> > library, I can give conversions a name and identify them by
> > that name.  This way the CString wouldn't depend on the exact
> > conversion interface,
> 
> Textual names are not enough. Conversions can be constructed on the
> fly, e.g. by improving an existing conversion by substituting some
> strings for characters which can't be handled by it.
> 
> Currently the type of the additional withCStringConv's argument is
> IO (Conv Char Byte). It's IO because this is how "not started yet
> conversions" are expressed. Conv Char Byte itself is an anstract type
> of a stateful conversion which is taking place.

I didn't want to imply that this admittedly very simple
data type CConv can actually represent all possible
conversions.  However, we might have a very simple
conversion interface in CString, and then, the real fully
fledged conversion interface in the libraries that you are
designing.

However, as I said, I want at least be able to distinguish
to/fromLatin1 and localIn/localOut [Why not call it
toLocal/fromLocal?].  If we have the distinction anyway,
making it a little more flexible and adding CustomCConv
seems sensible.  We could have conversions like

  CustomCConv "Latin2"

and

  CustomCConv "ja_JP.euc"

Maybe your conversion library could, then, have a function
like 

  registerConv :: String -> IO (Conv Char Byte) -> IO ()

which allows me to give symbolic names to conversions.  We
could then predefine names for a set of commonly used
conversions.  This way many users might be able to use
these conversions more easily.

> > Routines like mallocCString and pokeCString would only make sense
> > for `NoCConv', then.
> 
> We already have mallocArray0 and pokeArray0. You only have to cast
> characters to [CChar].

Sure - but why not have this as predefined functions in
CString?  That's all I am proposing.

> > Another example is configuration management in libraries
> > like the Gnome library.  A program can dump its session data
> > into an ASCII file using these libraries, so that it doesn't
> > have to mantain its own preferences and resource files.  Do
> > we really want all this stuff to go through the converter?
> 
> In what encoding are natural language texts in this dump?
> If it's ASCII, use fromLatin1.
> 
> I have yet to benchmark conversions.

That's exactly where my concern is.  We are designing an
interface which requires all string marshalling to go
through a procedure from which we have no idea how fast it
is...for which, however, it seems clear - from what you said
so far and briefly looking at the QForeign code - that it
won't be really cheap.  I don't like that.  I want a
backdoor to a really cheap and dirty to/fromLatin1
conversion in the standard FFI interface. 

> > [2] Mojibake is the Japanese term for Japanese text
> >     displayed through software that cannot handle it.
> >     Mojibake is written as "^[$BJ8;z2=$1^[(B" in Japanese and if
> >     your mail reader can't handle Japanese, you'll see just
> >     that ;-)
> 
> Unfortunately I don't see that. But if I used a (nonexistant yet)
> newsreader and editor written in Haskell which made nontrivial use
> of the conversion machinery, I would probably see that :-)

Just use XEmacs with Mule support as your mail reader and
you can enjoy that already today :-)  It can even render
different scripts in one buffer (like the UTF-8-patched
xterm), eg, I can write Japanese/German vocabulary cheat
sheets having Japanese characters and German umlauts.

BTW, do you know Pango <http://www.pango.org/>? 

Cheers,
Manuel