Text in Haskell: A PROPOSAL
Wolfgang Jeltsch
wolfgang@jeltsch.net
08 Aug 2002 14:28:56 +0200
On Thursday, 2002-08-08, 13:05, CEST, Ketil Z. Malde wrote:
> I wonder if anybody are actually *using* non-octet based encodings
> (e.g. UTF-16/UCS-2) in files or in sockets (without wrapping the
> encoded content in a higher level protocol, like MIME)? Even if
> various standards support them, we might be better off with less
> complexity and handling the *useful* cases, if it turns out the
> complex cases aren't real world.
I would say, dealing with a character encoding _scheme_*) like UTF-16LE
or UTF-16BE is as complex as dealing with any other encoding scheme. And
since we may assume that files and sockets work with octets, it makes no
sense to provide support for non-octet based encoding _forms_ like
UTF-16 in this area. All one has to provide for such forms is, IMHO,
some conversion functions/parsers.
> [...]
Wolfgang
*) The Unicode Standard (at least 3.0) makes a distinction between
character encoding forms and character encoding schemes.
Character encoding forms specify the representation of characters as
actual data in a computer. The Unicode Standard uses two encoding
forms: 16-bit and 8-bit [i.e. UTF-16 and UTF-8].
--- The Unicode Standard 3.0, section 2.3
A character encoding scheme consists of an encoding form plus byte
serialization.
--- The Unicode Standard 3.0, section 2.3