Text in Haskell: A PROPOSAL

Wolfgang Jeltsch wolfgang@jeltsch.net
08 Aug 2002 14:28:56 +0200


On Thursday, 2002-08-08, 13:05, CEST, Ketil Z. Malde wrote:
> I wonder if anybody are actually *using* non-octet based encodings
> (e.g. UTF-16/UCS-2) in files or in sockets (without wrapping the
> encoded content in a higher level protocol, like MIME)?  Even if
> various standards support them, we might be better off with less
> complexity and handling the *useful* cases, if it turns out the
> complex cases aren't real world.

I would say, dealing with a character encoding _scheme_*) like UTF-16LE
or UTF-16BE is as complex as dealing with any other encoding scheme. And
since we may assume that files and sockets work with octets, it makes no
sense to provide support for non-octet based encoding _forms_ like
UTF-16 in this area. All one has to provide for such forms is, IMHO,
some conversion functions/parsers.

> [...]

Wolfgang

*) The Unicode Standard (at least 3.0) makes a distinction between
character encoding forms and character encoding schemes.
    Character encoding forms specify the representation of characters as
    actual data in a computer. The Unicode Standard uses two encoding
    forms: 16-bit and 8-bit [i.e. UTF-16 and UTF-8].
    --- The Unicode Standard 3.0, section 2.3

    A character encoding scheme consists of an encoding form plus byte
    serialization.
    --- The Unicode Standard 3.0, section 2.3