[Haskell-cafe] File path programme

Wed Jan 26 22:01:37 EST 2005

> I would say that all paths are relative to something, whether it's the 
> Unix root, or the current directory, or whatever. Therefore I would call 
> this something like PathStart, and add:
> 
>     | CurrentDirectory
>     | CurrentDirectoryOfWindowsDrive Char
>     | RootOfCurrentWindowsDrive

This is true in a sense, but I think making the distinction explicit is
helpful for a number of the operations we want to do.  For example, what
is the parent of the relative path "."?  Answer is "..".  What is the
parent of "/." on unix?  Answer is "/.".  I would also argue that it
only makes sense to append a relative path on the right (ie, we can't
append "/tmp/foo" onto "/usr/local", but we can append "tmp/foo").
Relative paths can refer to different things in the filesystem depending
on process-local state, whereas absolute paths will always refer to the
same thing (until the filesystem changes, or if you do something
esoteric like "chroot").  Relative paths are really "path fragments."

> On Unix, there are two nodes we can name directly, the "root" and the 
> "current directory". On Windows, there are 26 roots and 26 current 
> directories which we can name directly; additionally we can name the 
> root or current directory of the current drive, which is one of those 
> 26, and there are an arbitrary number of network share roots, and \\.\, 
> and perhaps some other stuff I don't know about.

There are a few others.  I took a look at MSDN earlier and was
astounded.

> Whether we're talking about the final node or the final edge depends on 
> the OS call; this is the usual pointer-vs-pointee confusion that's also 
> found in most programming languages outside the ML family. Probably we 
> can ignore it, with the exception of the "/foo" vs "/foo/" distinction, 
> which we must preserve.

I've solved that as you suggested where "foo/" goes to "foo/."

>  > class (Show p) => Path p where
> Okay, I'm not convinced that a Path class is the right approach. 

I'm not convinced either, but it feels natural to me.

> I'm tentatively opposed to (B), since I think that the only interesting 
> difference between Win32 and Posix paths is in the set of starting 
> points you can name. (The path separator isn't very interesting.) But 
> maybe it does make sense to have separate starting-point ADTs for each 
> operating system. Then of course there's the issue that Win32 edge 
> labels are Unicode, while Posix edge labels are [Word8]. Hmm.

I think these differences make separate implementations worthwhile.  The
question then is wether to abstract them via a type class, or with a
datatype like:

data FilePath
   = POSIXFilePath POSIXPath
   | WinFilePath   WinPath

Disadvantage here is that the datatype is closed.  Advantage is that
pattern matching tells you what kind of path you have staticly.

>  > pathCleanup :: p -> p           -- remove .. and suchlike
> 
> This can't be done safely except in a few special cases (e.g. "/.." -> 
> "/"). I'm not sure it should be here.

More than you would think, if you follow the conventions of modern unix
shells.  eg, "foo/.." is always equal to ".", and "foo/bar/../../.." is
equal to "..", and "foo///bar" is equal to "foo/bar".  This is the
behavior that "cd" gives on modern posix shells (rather than doing a
chdir on the ".." hardlink, which does strange things in the presence of
symlinks).  The operation is sufficently useful that I think it should
be included.  It lets us know, for example, that "/bar/../foo/tmp" and
"/foo/tmp" refer to the same file, without resorting to any IO
operations.

>  > hasExtension :: p -> String -> Bool

> This is really an operation on a single component of the path. I think 
> it would make more sense to make it an ordinary function with type 
> String -> String -> Bool and use the basename method to get the 
> appropriate path component.

The problem is that String doesn't faithfully capture the representation
of path edges.  For POSIX it is a sequence of Word8 (except for 0x2F).
In my implementation of UnixPaths, each path carries along an encoding
component, which (theoreticly) tells you how to do [Word8] <-> [Char]
translations.  Eventually we will get a real IO layer complete with
character encodings and this will be meaningful.  The comparison needs
to be done with encodings in mind.

>  > pathToForeign :: p -> IO (Ptr CChar)
>  > pathFromForeign :: Ptr CChar -> IO p
> 
> This interface is problematic. Is the pointer returned by pathToForeign 
> a heap pointer which the caller is supposed to free? If so, a Ptr CChar 
> instance would have to copy the pathname every time. And I don't 
> understand exactly what pathFromForeign is supposed to do.

Agree, I like the withCPath interface better.  pathFromForeign takes a
path representation directly from C land, without going through String
first (again with encoding issues in mind).  Although it should perhaps
be:

pathFromForeign :: Ptr () -> IO p

instead (might be wide chars).