Let's get this finished

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Mon Jan 8 03:17:36 EST 2001


Mon, 08 Jan 2001 14:55:31 +1100, Manuel M. T. Chakravarty <chak at cse.unsw.edu.au> pisze:

> How about having an interface where the String marshalling
> functions take an additional argument
> 
>   data CConv = NoCConv                  -- handle as 8bit chars
>            | StdCConc                   -- standard conversion
>            | CustomCConv String         -- special conversions

There are already string marshalling variants which take the conversion
as the argument. The first is equivalent to toLatin1 / fromLatin1,
the second is localOut / localIn or using functions with no explicit
conversion (names are of source subject to discussion), the third would
either require building a database of name -> conversion mappings or
just pass names to iconv, assuming iconv knows that conversion.

They make no sense when there is never any conversion, so I haven't
talked about them yet when discussing MarshalCString or equivalents.

If you know that data is always ASCII, you can use toLatin1 as the
conversion.

> Then, it is up to the programmer to decide whether to use
> conversion.

There is always a conversion. Char and CChar are not the same type.
But it can be as trivial as toLatin1.

> The idea of the last variant would be that in your conversion
> library, I can give conversions a name and identify them by
> that name.  This way the CString wouldn't depend on the exact
> conversion interface,

Textual names are not enough. Conversions can be constructed on the
fly, e.g. by improving an existing conversion by substituting some
strings for characters which can't be handled by it.

Currently the type of the additional withCStringConv's argument is
IO (Conv Char Byte). It's IO because this is how "not started yet
conversions" are expressed. Conv Char Byte itself is an anstract type
of a stateful conversion which is taking place.

> Routines like mallocCString and pokeCString would only make sense
> for `NoCConv', then.

We already have mallocArray0 and pokeArray0. You only have to cast
characters to [CChar].

> Another example is configuration management in libraries
> like the Gnome library.  A program can dump its session data
> into an ASCII file using these libraries, so that it doesn't
> have to mantain its own preferences and resource files.  Do
> we really want all this stuff to go through the converter?

In what encoding are natural language texts in this dump?
If it's ASCII, use fromLatin1.

I have yet to benchmark conversions.

> Furthermore, to be honest, I am not really sure why we have
> to do the conversion anyway.  When I am having a Haskell
> program like [1]
> 
>   main = putStrLn "今日は"
> 
> then, there are two possibilities.  Either I have a system
> configured with the locale jp_JP and I happen to run this
> Haskell program in kterm or an Mule/(X)Emacs subshell, or I
> will get mojibake[2] anyway.

You can have another terminal capable of displaying Japanese, e.g.
UTF-8-patched xterm (I don't know how well it works in practice
for Japanese, but they are using it for harder scripts like Arabic
so I guess it's just OK). The the same compiled program will then
display correctly (as long as the locale is set appropriately for
the terminal).

You can have Japanese texts in the source in any encoding, as long as
the Haskell compiler is able to understand various source encodings,
and the right encoding was specified in some way.

UTF-8-patched xterm has an advantage over Mojibake that it displays
not only Japanese, but about any charset which is possible to display
in a fixed-with font (including double-width characters).

> [2] Mojibake is the Japanese term for Japanese text
>     displayed through software that cannot handle it.
>     Mojibake is written as "^[$BJ8;z2=$1^[(B" in Japanese and if
>     your mail reader can't handle Japanese, you'll see just
>     that ;-)

Unfortunately I don't see that. But if I used a (nonexistant yet)
newsreader and editor written in Haskell which made nontrivial use
of the conversion machinery, I would probably see that :-)

-- 
 __("<  Marcin Kowalczyk * qrczak at knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTÊPCZA
QRCZAK





More information about the FFI mailing list