[Haskell-cafe] File path programme
Robert Dockins
robdockins at fastmail.fm
Wed Jan 26 22:01:37 EST 2005
> I would say that all paths are relative to something, whether it's the
> Unix root, or the current directory, or whatever. Therefore I would call
> this something like PathStart, and add:
>
> | CurrentDirectory
> | CurrentDirectoryOfWindowsDrive Char
> | RootOfCurrentWindowsDrive
This is true in a sense, but I think making the distinction explicit is
helpful for a number of the operations we want to do. For example, what
is the parent of the relative path "."? Answer is "..". What is the
parent of "/." on unix? Answer is "/.". I would also argue that it
only makes sense to append a relative path on the right (ie, we can't
append "/tmp/foo" onto "/usr/local", but we can append "tmp/foo").
Relative paths can refer to different things in the filesystem depending
on process-local state, whereas absolute paths will always refer to the
same thing (until the filesystem changes, or if you do something
esoteric like "chroot"). Relative paths are really "path fragments."
> On Unix, there are two nodes we can name directly, the "root" and the
> "current directory". On Windows, there are 26 roots and 26 current
> directories which we can name directly; additionally we can name the
> root or current directory of the current drive, which is one of those
> 26, and there are an arbitrary number of network share roots, and \\.\,
> and perhaps some other stuff I don't know about.
There are a few others. I took a look at MSDN earlier and was
astounded.
> Whether we're talking about the final node or the final edge depends on
> the OS call; this is the usual pointer-vs-pointee confusion that's also
> found in most programming languages outside the ML family. Probably we
> can ignore it, with the exception of the "/foo" vs "/foo/" distinction,
> which we must preserve.
I've solved that as you suggested where "foo/" goes to "foo/."
> > class (Show p) => Path p where
> Okay, I'm not convinced that a Path class is the right approach.
I'm not convinced either, but it feels natural to me.
> I'm tentatively opposed to (B), since I think that the only interesting
> difference between Win32 and Posix paths is in the set of starting
> points you can name. (The path separator isn't very interesting.) But
> maybe it does make sense to have separate starting-point ADTs for each
> operating system. Then of course there's the issue that Win32 edge
> labels are Unicode, while Posix edge labels are [Word8]. Hmm.
I think these differences make separate implementations worthwhile. The
question then is wether to abstract them via a type class, or with a
datatype like:
data FilePath
= POSIXFilePath POSIXPath
| WinFilePath WinPath
Disadvantage here is that the datatype is closed. Advantage is that
pattern matching tells you what kind of path you have staticly.
> > pathCleanup :: p -> p -- remove .. and suchlike
>
> This can't be done safely except in a few special cases (e.g. "/.." ->
> "/"). I'm not sure it should be here.
More than you would think, if you follow the conventions of modern unix
shells. eg, "foo/.." is always equal to ".", and "foo/bar/../../.." is
equal to "..", and "foo///bar" is equal to "foo/bar". This is the
behavior that "cd" gives on modern posix shells (rather than doing a
chdir on the ".." hardlink, which does strange things in the presence of
symlinks). The operation is sufficently useful that I think it should
be included. It lets us know, for example, that "/bar/../foo/tmp" and
"/foo/tmp" refer to the same file, without resorting to any IO
operations.
> > hasExtension :: p -> String -> Bool
> This is really an operation on a single component of the path. I think
> it would make more sense to make it an ordinary function with type
> String -> String -> Bool and use the basename method to get the
> appropriate path component.
The problem is that String doesn't faithfully capture the representation
of path edges. For POSIX it is a sequence of Word8 (except for 0x2F).
In my implementation of UnixPaths, each path carries along an encoding
component, which (theoreticly) tells you how to do [Word8] <-> [Char]
translations. Eventually we will get a real IO layer complete with
character encodings and this will be meaningful. The comparison needs
to be done with encodings in mind.
> > pathToForeign :: p -> IO (Ptr CChar)
> > pathFromForeign :: Ptr CChar -> IO p
>
> This interface is problematic. Is the pointer returned by pathToForeign
> a heap pointer which the caller is supposed to free? If so, a Ptr CChar
> instance would have to copy the pathname every time. And I don't
> understand exactly what pathFromForeign is supposed to do.
Agree, I like the withCPath interface better. pathFromForeign takes a
path representation directly from C land, without going through String
first (again with encoding issues in mind). Although it should perhaps
be:
pathFromForeign :: Ptr () -> IO p
instead (might be wide chars).
More information about the Haskell-Cafe
mailing list