[Haskell-cafe] Path names (again?)

Glynn Clements glynn.clements at virgin.net
Fri Feb 6 02:07:42 EST 2004


Vincenzo aka Nick Name wrote:

> > I was going to ask what you meant here but, AFAICT, Haskell (at
> > least, GHC 5.04) doesn't appear to recognise the existence of
> > symlinks. So, whatever you meant, the answer is probably "no".
> 
> I currently use module System.Posix from ghc6, there are stat and lstat 
> equivalents, what I want is to get the true file pointed from a symlink 
> after having known that it is a symlink, this can be done with 
> recursion of course, and it's trivial; I was just wondering if it was 
> implemented somewhere else, because I am not so expert in working with 
> filesystems and could make some mistake (e.g: I realized only recently 
> that using an hashtable of already visited files is necessary to avoid 
> ciclic links; also, without getting the canonical path, I could visit a 
> file twice).

Again, you probably want a binding for realpath(). However, note that
realpath() implementations don't generally keep a history. They just
keep a symlink counter; in the event of a cycle, the symlink counter
will eventually hit its limit, resulting in ENAMETOOLONG.

> > > - find all the files in a directory (yes, that's what I need :))
> >
> > Define "file" (e.g. "regular file", "anything other than a
> > directory", "directory entry" etc). Also, define "in"; i.e. are you
> > talking about a recursive search (like "find")?
> >
> 
> Yes, I forgot to say "recursively". I have an ocaml implementation but 
> it's prone to errors because of missing "canonicalization", so I did 
> not want to translate that in haskell for the same problem. Currently I 
> workarounded this all by forking "find", but it's prone to errors too 
> because I have no way to distinguish between newlines ending a file 
> name and newlines in the middle of a file name. I should put something 
> like "///" with "find -printf" at the end of each file name, and then 
> parse that,

Use "find ... -print0", which NUL-terminates each filename. Having
said that, the only real-world scenario in which you are likely to
encounter filenames which contain embedded newlines is if someone
created them with malicious intent. More on malicious intent below.

> but it would really be preferable to code an haskell 
> library function equivalent to unix find.

For recursive directory scanning, you don't need full
canonicalisation; you just need to be able to distinguish actual
directories from symlinks to directories (i.e. lstat()). Just ensure
that all symlinks are treated as leaves, along with "." and "..", and
you have a strict tree structure.

FWIW, this assumes that the OS doesn't allow hard links to be made to
directories. However, AFAIK, that's true of every version of Unix
which is still in use outside of a computer museum. Even on the ones
which did allow hard links to directories, directory-recursion tended
to exhibit undesirable (but entirely predictable) behaviour if you
actually did so.

Also, if you are concerned about security issues, you need to consider
the possibility of symlink races; i.e. where an attacker does:

	chdir("foo");
	mkdir("bar", mode);
	/* "find" lstat()s "bar" and decides that it's a directory */
	rename("bar", "_bar");
	symlink("/etc", "bar");
	/* "find" ends up chdir()ing into /etc  */

To deal with that situation, calls to chdir() need to be followed up
with a check to ensure that they ended up where they thought they
would, e.g. by comparing the device:inode pair for "." with the values
obtained from the lstat() on the directory entry, or by comparing the
device:inode pair for ".." with those for the previous directory.

-- 
Glynn Clements <glynn.clements at virgin.net>


More information about the Haskell-Cafe mailing list