[Haskell-cafe] File path programme

robert dockins robdockins at fastmail.fm
Thu Jan 27 16:31:21 EST 2005


> I don't pretend to fully understand various unicode standard but it
> seems to me that these problems are deeper than file path library. The
> equation (decode . encode)
> /= id seems confusing for me. Can you give me an example when this
> happen? 

I am pretty sure that ISO 2022 encoded strings can have multiple ways to 
express the same unicode glyphs.  This means that any sensible relation 
between IS0 2022 strings and unicode strings maps more than one ISO 2022 
string onto the same unicode string.  The inverse is therefore not a 
function.  To make it a function one of the possibly several encodings 
of the unicode string will have to be chosen.  So you have a ISO 2022 
string A which is decoded to a unicode string U.  We reencode U to an 
ISO 2022 string B.  It may be that A /= B.  That is the problem.

The various UTF encodings do not have this particular problem; if a UTF 
string is valid, then it is a unique representation of a unicode string.
However, decoding is still a partial function and can fail.

A discussion about this problem floated around on this list several 
months ago.

 > What can we do when the file name is passed as command line
 > argument to program? We need to convert String to FilePath after all.

Then we can parse the unicode and hope that nothing bad happens; the 
majority of the time, we will be OK.  Or we can make the RTS allow 
access to the raw bytes of the program arguments, env variables, etc, 
and actually do the right thing.



More information about the Haskell-Cafe mailing list