[Haskell-cafe] Re: File path programme

Glynn Clements glynn at gclements.plus.com
Sun Jan 30 10:18:54 EST 2005


Marcin 'Qrczak' Kowalczyk wrote:

> >> The various UTF encodings do not have this particular problem; if =
a UTF
> >> string is valid, then it is a unique representation of a unicode s=
tring.
> >> However, decoding is still a partial function and can fail.
> >
> > And while it is partly true, it is qualified by the problems relati=
ve to
> > canonicalization (an "=1B-B=E9" in Unicode can both be represented =
as "=E9" or as two=1B-A
> > chars (an e and an accent) and they should (ideally) compare equal)=
.
>=20
> In what sense "equal"? They are supposed to be equivalent as far
> as the semantics of the text is concerned, but representations are
> clearly different and most programs distinguish them. In particular
> they are different filenames on both Unix and Windows. AFAIK MacOS
> normalizes filenames, but using a slightly different algorithm than
> Unicode (perhaps just an older version).
>=20
> IMHO it makes no sense to pretend that they are exactly the same when=

> strings consist of code points or lower level units (and I don't
> believe another choice for the default string type would be practical=
).

Well, at least you and I agree on that.

Once you start down the "semantic equivalence" route, you will quickly
run into issues like "=DF" =3D=3D "ss", and it only gets worse from the=
re
on.

--=20
Glynn Clements <glynn at gclements.plus.com>


More information about the Haskell-Cafe mailing list