[Haskell-cafe] invalid character encoding

Glynn Clements glynn at gclements.plus.com
Sat Mar 19 09:32:01 EST 2005


Einar Karttunen wrote:

> > In what way is ISO-2022 non-reversible? Is it possible that a ISO-2022 
> > file name that is converted to Unicode cannot be converted back any 
> > more (assuming you know for sure that it was ISO-2022 in the first 
> > place)?
> 
> I am no expert on ISO-2022 so the following may contain errors,
> please correct if it is wrong.
> 
> ISO-2022 -> Unicode is always possible.
> Also Unicode -> ISO-2022 should be always possible, but is a relation
> not a function. This means there are an infinite? ways of encoding a
> particular unicode string in ISO-2022.
> 
> ISO-2022 works by providing escape sequences to switch between different
> character sets. One can freely use these escapes in almost any way you
> wish.

Exactly.

Moreover, while there are an infinite number of equivalent
representations in theory (you can add as many redundant switching
sequences as you wish), there are multiple "plausible" equivalent
representations in practice.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the Haskell-Cafe mailing list