[Haskell-cafe] RE: ANN: System.FilePath 0.9

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Wed Jul 26 15:27:02 EDT 2006


On Wed, 2006-07-26 at 19:41 +0200, Udo Stenzel wrote:
> Duncan Coutts wrote:
> > On Wed, 2006-07-26 at 15:29 +0200, Udo Stenzel wrote:
> > 
> > > Exactly.  I believe, a FilePath should be an algebraic datatype.
> > 
> > We've had this discussion before. The main problem is that all the
> > current IO functions (readFile, etc) use the FilePath type, which is
> > just a String.
> 
> So what's better?
> 
> - use an ADT (correct and portable by construction), convert to String
>   when calling the IO library
> 
> - fumble with Strings, use an unholy mix of specialized and general
>   functions, trip over a corner case

In practise in the short term, the choice is between each application
fumbling with strings in different incorrect ways or a library that
fumbles with strings in a rather more considered and portable way.

> > So a new path ADT is fine if at the same time we provide
> > a new IO library.
> 
> We should just wrap the old API, filePathToString any parameters and
> liftIO the function while we're at it.

Try proposing something concrete and see if you can get it generally
accepted. Perhaps you can get it accepted for the next major release of
various Haskell implementations or for Haskell-prime.

> > That's another portability headache - file name string encodings.
> > Windows and OSX use encodings of Unicode. Unix uses strings of bytes.
> 
> Indeed.  There are two ways out:
> 
> - declare that Unix uses Unicode too, take the appropriate conversion
>   from the locale

Sadly this does not work. For one thing you don't know that the locale
you're using now was the locale of the program that wrote the file. This
happens on multi-user systems where different users use different
languages.

Then there is the fact that converting from Unicode back to the file
name is not guaranteed to give the same sequence of bytes.

For example, see the section "File Name Encodings" in the glib api:
http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html

> - parameterize the FilePath ADT on the character type, you get (FilePath
>   Word16) on Windows (which uses UCS-2, not UCS-4 and not UTF-16) and
>   (FilePath Word8) on Unix; provide conversions from/to (FilePath
>   String).

> I tend towards the second option.  It at least doesn't make anything
> worse than it already is.  It's also irrelevant, since pretending the
> issue doesn't exist works equally well with an ADT.

Yeah, keeping it in the native format and doing no change of encoding is
almost certainly the way to go. It doesn't address the issue of
converting file names to/from displayable strings, but perhaps that's
reasonable.

> > My point is it's not quite as simple as "just making an ADT".
> 
> Mine is that it is :)  Moreover, a path already has internal structure.
> Those string manipulating functions either reconstruct the structure,
> then operate on that, then encode it back into a string or implement an
> approximation to that.  The latter leads to surprises and making the
> former explicit can never hurt.  Heck, NO library fumbles with strings,
> neither parsers nor pretty printers nor Network... why should a FilePath
> be different?

For compatibility with the Haskell98 IO library. There's also the issue
here that adding in lots of conversions ADT <-> String means that people
will not bother to use it and will continue to do things like:
readFile (path ++ "/" ++ file)

If anyone can actually design and implement an ADT that addresses most
of these problems and can get it to work nicely with whatever is the
popular IO system of the time then that'd be great. I think you'll find
that it's not quite as simple as it looks. There was a discussion on a
path ADT on the libraries list a while ago that's probably worth
reading. I don't think it reached consensus.

Duncan



More information about the Haskell-Cafe mailing list