[Haskell-cafe] File path programme
Ben Rudiak-Gould
Benjamin.Rudiak-Gould at cl.cam.ac.uk
Wed Jan 26 18:52:10 EST 2005
robert dockins wrote:
> After the discussion about file paths over the last several days I
went home and put together a quick trial implementation for unix file
paths, with the idea of adding windows, SMB and maybe VMS (why not?) paths.
This is great. Comments below.
> data PathRoot
> = UnixFilesystemRoot
> | WindowsDrive Char
> | SMBShare String String
> | VMSDevice String
> | ... -- whatever else we need
I would say that all paths are relative to something, whether it's the
Unix root, or the current directory, or whatever. Therefore I would call
this something like PathStart, and add:
| CurrentDirectory
| CurrentDirectoryOfWindowsDrive Char
| RootOfCurrentWindowsDrive
What is a pathname, broadly speaking? Answer: it's a description of a
path in a directed graph with labeled edges. It consists of a single
node designator (the starting point) and a sequence of edge designators,
i.e.
data Pathname =
Pathname {
pathStart :: PathStart,
pathEdges :: [String]
}
Most of the time all we care about is either the final node or the final
edge that we reach by following the path. The only reason we specify the
rest of the path is that there are only a few nodes that we can name
directly; to refer to any other location on the graph we have to give
"driving directions" from one of those nodes. There's no reason the OS
couldn't make nodes and edges first-class entities--it would solve a
multitude of problems--but most don't, so forget that.
On Unix, there are two nodes we can name directly, the "root" and the
"current directory". On Windows, there are 26 roots and 26 current
directories which we can name directly; additionally we can name the
root or current directory of the current drive, which is one of those
26, and there are an arbitrary number of network share roots, and \\.\,
and perhaps some other stuff I don't know about.
Symbolic links complicate things a bit, since they are followed like
edges but are actually paths (so they may be affected by seemingly
unrelated changes to the graph). They're rather like VPNs, actually,
though I'm not sure how far I want to push that analogy.
Whether we're talking about the final node or the final edge depends on
the OS call; this is the usual pointer-vs-pointee confusion that's also
found in most programming languages outside the ML family. Probably we
can ignore it, with the exception of the "/foo" vs "/foo/" distinction,
which we must preserve. This can probably be handled by parsing the
latter as Pathname { pathStart = UnixFilesystemRoot, pathEdges =
["foo","."] }.
> class (Show p) => Path p where
Okay, I'm not convinced that a Path class is the right approach. For the
reasons given above, I think I'd rather have a single Path datatype,
probably with its data constructors exported. What do we gain with the
class approach? Well...
(A) Functions that accept paths can be polymorphic on the path type
(where String is a path type).
(B) We can have different datatypes for the paths of different
operating systems.
It seems like these are two very different problems which should be
solved with different typeclasses, if they're to be solved with
typeclasses at all. I think (A) can be solved very simply, and
independently of the specification of a path-handling library:
class IsPath a where
withCPath :: a -> (Ptr CChar -> IO b) -> IO b
instance IsPath String where
withCPath = withCString -- tricky i18n issues!
instance IsPath [CChar] where
withCPath = withArray0 0
instance IsPath PathADT where
withCPath = withCString . pathToString
instance IsPath (Ptr CChar) where
withCPath = flip ($)
openFile :: (IsPath p) => p -> ...
I'm tentatively opposed to (B), since I think that the only interesting
difference between Win32 and Posix paths is in the set of starting
points you can name. (The path separator isn't very interesting.) But
maybe it does make sense to have separate starting-point ADTs for each
operating system. Then of course there's the issue that Win32 edge
labels are Unicode, while Posix edge labels are [Word8]. Hmm.
> isAbsolute :: p -> Bool
Definition: a path is absolute if its meaning is independent of (Posix:
the current directory) (Win32: all current directories and the current
drive).
> pathCleanup :: p -> p -- remove .. and suchlike
This can't be done safely except in a few special cases (e.g. "/.." ->
"/"). I'm not sure it should be here.
> hasExtension :: p -> String -> Bool
This is really an operation on a single component of the path. I think
it would make more sense to make it an ordinary function with type
String -> String -> Bool and use the basename method to get the
appropriate path component.
> pathToForeign :: p -> IO (Ptr CChar)
> pathFromForeign :: Ptr CChar -> IO p
This interface is problematic. Is the pointer returned by pathToForeign
a heap pointer which the caller is supposed to free? If so, a Ptr CChar
instance would have to copy the pathname every time. And I don't
understand exactly what pathFromForeign is supposed to do.
-- Ben
More information about the Haskell-Cafe
mailing list