UTF-8 library

Sven Moritz Hallberg pesco@gmx.de
09 Aug 2002 10:17:21 +0200


On Thu, 2002-08-08 at 18:26, anatoli wrote:
> Having a locale associated with each individual stream is much more
> convenient.

I argue _strongly_ against associating some sort of locale state with
handles.

1) In agreement with Ashley's statements, file IO should use octets,
because that's what's in a file.

2) If you need to decode those octets to characters, or vice-versa,
compose a (de)serialization function before it.

3) A "best shot" character reading(or writing, for that matter)
function, will be convenient. This should probably use your current
locale, because when writing a character, you'll probably want to be
able to write your own language's characters correctly.

4) For decoding, we'll need some parsing functionality, as someone
already mentioned. With that we can have functions like parseUTF8.
"Associating a locale with a stream", as you put it, is a matter of, if
f is the raw Word8 stream, g = parseUTF8 f, where g is the Char stream,
parsed as UTF-8-encoded characters from f.


Sven Moritz