Proposal #3456: Add FilePath -> String decoder
duncan.coutts at worc.ox.ac.uk
Fri Aug 28 18:50:27 EDT 2009
On Wed, 2009-08-26 at 16:14 +0300, Yitzchak Gale wrote:
> Johan Tibell wrote:
> > Perhaps the only solution is to have
> > System.FilePath.Posix.toString and System.FilePath.Windows.toString
> > with different type signatures.
> I'm not sure there's any point. As Duncan pointed out,
> we are not just talking about the file system, we are
> talking about interaction between the file system and
> a user interface - how file paths should appear to
> users. So it also depends on what UI you are using.
Mmm, this stuff is complex :-(
In general I like the idea of the proposal that we have functions for
converting between String and FilePath. As it says in the proposal, it
gets us closer to being able to treat FilePath as abstract.
Of course the devil is in the detail. Getting it right, and making it
portable and usable is hard.
> I am now beginning to lean towards Ketil's suggestion
> that on POSIX platforms we should always use
> UTF-8. We then need a prominent warning in the
> documentation that if you need something else,
> like the current locale, decode it yourself.
That's nice in that it makes the function pure, or equivalently so that
it does not need a locale parameter.
> Note that it is becoming increasingly rare for people
> to use non-UTF-8 locales anywhere in the world,
> and even then it's likely ignored by many UIs.
> So I'm inclined against cluttering the API with
> convenience functions for other encodings, as Johan
> is suggesting.
> As a way forward - I propose:
> 1. Accept Judah's patch, modified always to use UTF-8.
If we don't have the locale stuff then doesn't the API become a lot
filePathToString :: FilePath -> IO String
getFilePathToStringFunc :: IO (FilePath -> String)
filePathToString :: FilePath -> String
Presumably on POSIX we will follow the glib approach of using '?'
replacement chars, since the conversion to string is aimed at human
consumption. Doing this makes the function total but lossy.
And I didn't notice anything in the proposal about the other direction,
converting String to FilePath. Surely we need both.
stringToFilePath :: String -> FilePath
A nice thing about using UTF8 on POSIX is we know this function cannot
fail, unlike conversions into a locale encoding. Presumably on POSIX
this does not do any kind of Unicode canonicalisation, while on OSX and
Windows it would do the appropriate kind.
At this point I expect Johan to jump up and down and say these should
import qualified System.FilePath as FilePath
FilePath.toString :: FilePath -> String
FilePath.fromString :: String -> FilePath
In principle I guess it'd be ok to add versions in the
System.FilePath.Posix module that take an extra encoding parameter, but
it can't be the portable version since the encoding is fixed for OSX and
Windows. It's also jolly inconvenient, and as you've pointed out, of
> 2. Add strident warnings in the documentation that:
> o If you need a different encoding on POSIX, do it
> o If FilePath does not come from the file
> system, it may not match the actual file path used
> in the file system due to Unicode canonicalization.
Similar points apply to trying to round-trip via
toString . fromString :: String -> String
fromString . toString :: FilePath -> FilePath
The String -> String transform would do some Unicode canonicalisation on
Windows and OSX.
The FilePath -> FilePath would be identity on Windows and OSX for
strings coming from the file system. On POSIX however we can get utf8
decoding errors which will give us replacement chars.
So the advice in this section of the documentation should probably be
similar to the glib docs, where it says that you should keep both forms
in some circumstances. You can present the file name to the user though
a graphical or command line ui, but also so you can still access the
same file later (eg to save it). Especially in document-oriented GUI
apps, it's very annoying if you open, edit and save, but saving either
fails because it cannot re-encode, or ends up writing a different file
(different in Unicode canonicalisation or having replacement chars).
More information about the Libraries