[Haskell-cafe] invalid character encoding
Glynn Clements
glynn at gclements.plus.com
Sat Mar 19 09:32:01 EST 2005
Einar Karttunen wrote:
> > In what way is ISO-2022 non-reversible? Is it possible that a ISO-2022
> > file name that is converted to Unicode cannot be converted back any
> > more (assuming you know for sure that it was ISO-2022 in the first
> > place)?
>
> I am no expert on ISO-2022 so the following may contain errors,
> please correct if it is wrong.
>
> ISO-2022 -> Unicode is always possible.
> Also Unicode -> ISO-2022 should be always possible, but is a relation
> not a function. This means there are an infinite? ways of encoding a
> particular unicode string in ISO-2022.
>
> ISO-2022 works by providing escape sequences to switch between different
> character sets. One can freely use these escapes in almost any way you
> wish.
Exactly.
Moreover, while there are an infinite number of equivalent
representations in theory (you can add as many redundant switching
sequences as you wish), there are multiple "plausible" equivalent
representations in practice.
--
Glynn Clements <glynn at gclements.plus.com>
More information about the Haskell-Cafe
mailing list