[Haskell] Reading a directory tree

Glynn Clements glynn.clements at virgin.net
Tue Jun 22 19:31:02 EDT 2004


Duncan Coutts wrote:

> > > BTW, one other caveat (which applies to all of the examples so far):
> > > doesDirectoryExist doesn't distinguish between directories and
> > > symlinks to directories. Consequently, any directory-traversal
> > > algorithm which uses doesDirectoryExist to identify directories will
> > > behave incorrectly in the presence of symlinks. In the worst case, you
> > > can get into an infinite loop.
> > 
> > symlinks aren't necessary to give an infinite loop: you can have
> > upwards hard links as well (at least on *nix).  You have to keep a
> > list of inodes you've already visited (per filesystem, of course).
> 
> I believe hard linked directories are banned to avoid this problem.

Most modern Unices disallow the creation of additional hard links to
directories, so the only "upward" links are the "." entry and the ".." 
entries for each subdirectory (which I specifically filtered in my
example[1]).

[1] Actually, I filtered anything beginning with a "."; I should have
pointed that out. To change that, replace:
	let names' = filter ((/= '.') . head) names
with:
	let names' = filter (`notElem` [".", ".."]) names

Unices which do allow the creation of hard links to directories only
allow this for root, and programs which perform directory traversal
frequently fall down on such cases; tracking visited directories tends
to be the exception rather than the norm. The usual solution is to
assume that root knows what they're doing (and if they don't, tough)
and ignore the issue.

OTOH, directory-traversal code which follows symlinks will fall down
far more readily, as symlinks to directories can be created on all
platforms which allow symlinks, and require no special privileges.

[On early Linux distributions, it was quite common for /etc/inet to be
a symlink to /etc, as programs often disagreed as to whether certain
configuration files belonged in /etc or /etc/inet. Attempting to
back-up the Linux filesystem via Samba with a Windows backup tool
would result in an "end of tape" error somewhere around
/etc/inet/inet/inet/inet/....]

> This is true on Linux at least, I don't know what POSIX specifies.

I don't know about POSIX, but Unix98 says:

[http://www.opengroup.org/onlinepubs/009695399/functions/link.html]

  DESCRIPTION
  ...
    If path1 names a directory, link() shall fail unless the process has
    appropriate privileges and the implementation supports using link()
    on directories.
  ...
  ERRORS
  ...
    [EPERM]
        The file named by path1 is a directory and either the calling
        process does not have appropriate privileges or the
        implementation prohibits using link() on directories.
  ...
  RATIONALE

    Linking to a directory is restricted to the superuser in most
    historical implementations because this capability may produce
    loops in the file hierarchy or otherwise corrupt the file system. 
    This volume of IEEE Std 1003.1-2001 continues that philosophy by
    prohibiting link() and unlink() from doing this. Other functions
    could do it if the implementor designed such an extension.

There isn't a macro or sysconf/pathconf option to query whether the
platform allows links to directories.

-- 
Glynn Clements <glynn.clements at virgin.net>


More information about the Haskell mailing list