Let's get this finished
Manuel M. T. Chakravarty
chak at cse.unsw.edu.au
Mon Jan 8 09:02:00 EST 2001
qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) wrote,
> Mon, 08 Jan 2001 14:55:31 +1100, Manuel M. T. Chakravarty <chak at cse.unsw.edu.au> pisze:
>
> > How about having an interface where the String marshalling
> > functions take an additional argument
> >
> > data CConv = NoCConv -- handle as 8bit chars
> > | StdCConc -- standard conversion
> > | CustomCConv String -- special conversions
>
> There are already string marshalling variants which take the conversion
> as the argument. The first is equivalent to toLatin1 / fromLatin1,
> the second is localOut / localIn or using functions with no explicit
> conversion (names are of source subject to discussion), the third would
> either require building a database of name -> conversion mappings or
> just pass names to iconv, assuming iconv knows that conversion.
Where are these variants? In QForeign? The interface for
CString that was under discussion here didn't say anything
about conversions?
> They make no sense when there is never any conversion, so I haven't
> talked about them yet when discussing MarshalCString or equivalents.
Fair enough. So, basically what I am saying, then, is that
as long as it is not clear how the conversion interface
looks in detail, let's talk about two different conversions
that we definitely know we are going to need: the
toLatin1/fromLatin1 conversion and the localOut/localIn
conversion. I want the standard FFI CString interface to
support these two. The rest we can add later, but I don't
want to be restricted to the std conversion only.
BTW, how efficient is the code for toLatin1/fromLatin1?
> If you know that data is always ASCII, you can use toLatin1 as the
> conversion.
With the interface that we discussed so far, I can't.
> > Then, it is up to the programmer to decide whether to use
> > conversion.
>
> There is always a conversion. Char and CChar are not the same type.
> But it can be as trivial as toLatin1.
Ok, with "whether there is a conversion" I meant "whether
there is more than castCharToCChar lifted to Strings".
> > The idea of the last variant would be that in your conversion
> > library, I can give conversions a name and identify them by
> > that name. This way the CString wouldn't depend on the exact
> > conversion interface,
>
> Textual names are not enough. Conversions can be constructed on the
> fly, e.g. by improving an existing conversion by substituting some
> strings for characters which can't be handled by it.
>
> Currently the type of the additional withCStringConv's argument is
> IO (Conv Char Byte). It's IO because this is how "not started yet
> conversions" are expressed. Conv Char Byte itself is an anstract type
> of a stateful conversion which is taking place.
I didn't want to imply that this admittedly very simple
data type CConv can actually represent all possible
conversions. However, we might have a very simple
conversion interface in CString, and then, the real fully
fledged conversion interface in the libraries that you are
designing.
However, as I said, I want at least be able to distinguish
to/fromLatin1 and localIn/localOut [Why not call it
toLocal/fromLocal?]. If we have the distinction anyway,
making it a little more flexible and adding CustomCConv
seems sensible. We could have conversions like
CustomCConv "Latin2"
and
CustomCConv "ja_JP.euc"
Maybe your conversion library could, then, have a function
like
registerConv :: String -> IO (Conv Char Byte) -> IO ()
which allows me to give symbolic names to conversions. We
could then predefine names for a set of commonly used
conversions. This way many users might be able to use
these conversions more easily.
> > Routines like mallocCString and pokeCString would only make sense
> > for `NoCConv', then.
>
> We already have mallocArray0 and pokeArray0. You only have to cast
> characters to [CChar].
Sure - but why not have this as predefined functions in
CString? That's all I am proposing.
> > Another example is configuration management in libraries
> > like the Gnome library. A program can dump its session data
> > into an ASCII file using these libraries, so that it doesn't
> > have to mantain its own preferences and resource files. Do
> > we really want all this stuff to go through the converter?
>
> In what encoding are natural language texts in this dump?
> If it's ASCII, use fromLatin1.
>
> I have yet to benchmark conversions.
That's exactly where my concern is. We are designing an
interface which requires all string marshalling to go
through a procedure from which we have no idea how fast it
is...for which, however, it seems clear - from what you said
so far and briefly looking at the QForeign code - that it
won't be really cheap. I don't like that. I want a
backdoor to a really cheap and dirty to/fromLatin1
conversion in the standard FFI interface.
> > [2] Mojibake is the Japanese term for Japanese text
> > displayed through software that cannot handle it.
> > Mojibake is written as "^[$BJ8;z2=$1^[(B" in Japanese and if
> > your mail reader can't handle Japanese, you'll see just
> > that ;-)
>
> Unfortunately I don't see that. But if I used a (nonexistant yet)
> newsreader and editor written in Haskell which made nontrivial use
> of the conversion machinery, I would probably see that :-)
Just use XEmacs with Mule support as your mail reader and
you can enjoy that already today :-) It can even render
different scripts in one buffer (like the UTF-8-patched
xterm), eg, I can write Japanese/German vocabulary cheat
sheets having Japanese characters and German umlauts.
BTW, do you know Pango <http://www.pango.org/>?
Cheers,
Manuel
More information about the FFI
mailing list