Proposal #3456: Add FilePath -> String decoder
Judah Jacobson
judah.jacobson at gmail.com
Tue Sep 1 01:37:43 EDT 2009
On Mon, Aug 31, 2009 at 12:28 AM, Ketil Malde<ketil at malde.org> wrote:
> Duncan Coutts <duncan.coutts at worc.ox.ac.uk> writes:
>
>> Presumably on POSIX we will follow the glib approach of using '?'
>> replacement chars, since the conversion to string is aimed at human
>> consumption. Doing this makes the function total but lossy.
>
> If the FilePath is not a valid UTF-8, there's a private area in
> Unicode that can be used for encoding byte values. Wikipedia's UTF-8
> entry suggests "U+DCxx where xx is the byte's value".
>
> This would make us "non conformant" as per the bureaucracy, but on
> the other hand, it would work (with some ugliness for non-ASCII-based
> encodings) for any encoding, and these would be the expected identity:
Taking a step back, there's (at least) three separate issues at play here:
1) The FilePath type must be able to represent arbitrary byte
sequences on POSIX systems, but the current one-byte-per-Char is
suboptimal.
2) Much existing code probably relies on FilePath==String.
3) We need to be able to display FilePaths in a readable form to the user.
The U+DCxx method is a way to fix #1 without affecting #2. However, I
don't think this will solve issue #3 (which is what my proposal is
intended to address). Probably a FilePath->String display function
should explicitly replace the problem bytes with either "?" or "%xx".
> Can FilePath be defined differently on different systems?
>
> I.e. could it be:
>
> type FilePath = [Word8] -- Posix
> type FilePath = [Word16] -- Windows, etc
>
> It'd also be nice if overloaded string literals are used (and extended
> to these), so that I could use specify filenames directly with no need
> for wrappers.
Yes, this would solve #1 nicely; but runs into #2, so it's looking
unlikely that it will happen anytime soon.
>> In principle I guess it'd be ok to add versions in the
>> System.FilePath.Posix module that take an extra encoding parameter
>
> I think these belong in Text.Encodings or some such.
>
> Either you use the default, simple and pure interface (which is UTF-8
> on Posix), or you'll have to do some more work, and do something like
>
> mydecoder <- filePathToStringWith =<< getLocaleEncoding
Sure; though I'd expect that a TextEncoding would convert bytes
to/from Chars-as-Unicode, which isn't really useful on Windows. I
guess on Windows filePathToStringWith would just completely ignore the
encoding parameter. (But I do think it's important to have such a
function for portability.)
-Judah
More information about the Libraries
mailing list