[Haskell-cafe] File path programme

Wed Jan 26 18:52:10 EST 2005

robert dockins wrote:

 > After the discussion about file paths over the last several days I 
went home and put together a quick trial implementation for unix file 
paths, with the idea of adding windows, SMB and maybe VMS (why not?) paths.

This is great. Comments below.

 > data PathRoot
 >    = UnixFilesystemRoot
 >    | WindowsDrive Char
 >    | SMBShare String String
 >    | VMSDevice String
 >    | ... -- whatever else we need

I would say that all paths are relative to something, whether it's the 
Unix root, or the current directory, or whatever. Therefore I would call 
this something like PathStart, and add:

    | CurrentDirectory
    | CurrentDirectoryOfWindowsDrive Char
    | RootOfCurrentWindowsDrive

What is a pathname, broadly speaking? Answer: it's a description of a 
path in a directed graph with labeled edges. It consists of a single 
node designator (the starting point) and a sequence of edge designators, 
i.e.

    data Pathname =
      Pathname {
        pathStart :: PathStart,
        pathEdges :: [String]
      }

Most of the time all we care about is either the final node or the final 
edge that we reach by following the path. The only reason we specify the 
rest of the path is that there are only a few nodes that we can name 
directly; to refer to any other location on the graph we have to give 
"driving directions" from one of those nodes.  There's no reason the OS 
couldn't make nodes and edges first-class entities--it would solve a 
multitude of problems--but most don't, so forget that.

On Unix, there are two nodes we can name directly, the "root" and the 
"current directory". On Windows, there are 26 roots and 26 current 
directories which we can name directly; additionally we can name the 
root or current directory of the current drive, which is one of those 
26, and there are an arbitrary number of network share roots, and \\.\, 
and perhaps some other stuff I don't know about.

Symbolic links complicate things a bit, since they are followed like 
edges but are actually paths (so they may be affected by seemingly 
unrelated changes to the graph). They're rather like VPNs, actually, 
though I'm not sure how far I want to push that analogy.

Whether we're talking about the final node or the final edge depends on 
the OS call; this is the usual pointer-vs-pointee confusion that's also 
found in most programming languages outside the ML family. Probably we 
can ignore it, with the exception of the "/foo" vs "/foo/" distinction, 
which we must preserve. This can probably be handled by parsing the 
latter as Pathname { pathStart = UnixFilesystemRoot, pathEdges = 
["foo","."] }.

 > class (Show p) => Path p where

Okay, I'm not convinced that a Path class is the right approach. For the 
reasons given above, I think I'd rather have a single Path datatype, 
probably with its data constructors exported. What do we gain with the 
class approach? Well...

  (A) Functions that accept paths can be polymorphic on the path type 
(where String is a path type).
  (B) We can have different datatypes for the paths of different 
operating systems.

It seems like these are two very different problems which should be 
solved with different typeclasses, if they're to be solved with 
typeclasses at all. I think (A) can be solved very simply, and 
independently of the specification of a path-handling library:

    class IsPath a where
      withCPath :: a -> (Ptr CChar -> IO b) -> IO b

    instance IsPath String where
      withCPath = withCString   -- tricky i18n issues!

    instance IsPath [CChar] where
      withCPath = withArray0 0

    instance IsPath PathADT where
      withCPath = withCString . pathToString

    instance IsPath (Ptr CChar) where
      withCPath = flip ($)

    openFile :: (IsPath p) => p -> ...

I'm tentatively opposed to (B), since I think that the only interesting 
difference between Win32 and Posix paths is in the set of starting 
points you can name. (The path separator isn't very interesting.) But 
maybe it does make sense to have separate starting-point ADTs for each 
operating system. Then of course there's the issue that Win32 edge 
labels are Unicode, while Posix edge labels are [Word8]. Hmm.

 >    isAbsolute :: p -> Bool

Definition: a path is absolute if its meaning is independent of (Posix: 
the current directory) (Win32: all current directories and the current 
drive).

 > pathCleanup :: p -> p           -- remove .. and suchlike

This can't be done safely except in a few special cases (e.g. "/.." -> 
"/"). I'm not sure it should be here.

 > hasExtension :: p -> String -> Bool

This is really an operation on a single component of the path. I think 
it would make more sense to make it an ordinary function with type 
String -> String -> Bool and use the basename method to get the 
appropriate path component.

 > pathToForeign :: p -> IO (Ptr CChar)
 > pathFromForeign :: Ptr CChar -> IO p

This interface is problematic. Is the pointer returned by pathToForeign 
a heap pointer which the caller is supposed to free? If so, a Ptr CChar 
instance would have to copy the pathname every time. And I don't 
understand exactly what pathFromForeign is supposed to do.

-- Ben