Proposal for a new I/O library design

Mon, 28 Jul 2003 14:01:47 -0700 (PDT)

On Mon, 28 Jul 2003, Simon Marlow wrote:

> I'm concerned about one implementation difficulty.  Your File type is
> independent of the filesystem.  That is, on Unix it corresponds to an
> inode.  Creating a File must correspond to "opening" it (in Unix speak).
> Creating a stream corresponds to duplicating the file descriptor (you
> could probably avoid too many unnecessary dups by being clever).
> There's a potential implementation difficulty, though:
> lookupFileByPathname must open the file, without knowing whether the
> file will be used for reading or writing in the future.

I know; I'm hoping against hope that this isn't an insurmountable problem.

If the OS provides a "reopen" function which is like open except that it
takes a file handle instead of a pathname, then I think implementation is
straightforward: a File contains a handle with minimal access permissions
and maximal sharing permissions, and when a read or write operation is
attempted we open a second handle based on the first with additional
permissions. There's a Win32 function called "ReOpenFile" with this
functionality, but it's only in Windows Server 2003. Sigh.

If there's a way to open files by unique ID instead of pathname, that
would also work. I think the NT API might provide something like this, but
looking through the online documentation just now I can't find anything of
the sort.

If it's not possible to provide a guarantee of File identity then we
should probably drop the whole idea of File values. See my comments under
Directory below.

> So I would suggest that operations which create a value of type File
> take a read/write flag too.

This would break the conceptual identity between a File value and a file,
since read-only access is not a property of a file. (Well, it can be, but
it isn't in this case.) Functions which allowed access rights to be
specified would have to return a FileAccessPath instead of a File, and a
FileAccessPath is basically a handle, so we're back where we started.

All we need here is a way to change the access and sharing rights on an
already-open handle. I find it hard to believe that after decades of use
by millions of people, the UNIX file API provides no way to do this
safely. Maybe there's an fcntl or something?

> > > type FilePos = Word64
> > > type BlockLength = Int
> 
> FilePos should be Integer.

Seems reasonable.

> > > fCheckRead  :: File -> FilePos -> BlockLength -> IO Bool
> > > fCheckWrite :: File -> FilePos -> BlockLength -> IO Bool
> 
> What do these do?  If they're supposed to return True if the required
> data can be read/written without blocking, then I suspect that they are
> not useful.

They're supposed to return True if the data can be read/written
successfully, the idea being that this is how you check whether you have
read/write access to the file. Probably I should have omitted the second
and third arguments.

> I'd use the traditional 'isEOF' way of detecting end of file.

Seems reasonable. (Should be "EOS" though, I think.)

> On naming: it's probably not a good idea to use the 'is' prefix, since
> it is already used for predicates (meaning literally 'is' rather than an
> abbreviation for 'InputStream').

I agree completely. Come up with something better and I'll second it. :-)

(How about renaming Streams to Channels? Then we could use "ic" and "oc".)

> You will also want a way to get back from an InputStream to the
> underlying object, eg. the (File,FilePos) pair if one exists.

Agreed.

> It's not pretty, but you certainly want a way to close a stream.
> Finalizers aren't reliable enough.

What are the practical problems with relying on finalizers? As far as I
can see, the "no more filehandles available" problem is completely solved
by forcing a major GC and trying again when it occurs. The only other
issue I see is leaving other processes unable to access the file for an
indeterminate period of time. The right solution to this, if it can be
implemented, is something like
withExclusiveWriteAccess :: File -> IO a -> IO a, with write access being
non-exclusive (or even disallowed?) otherwise.

> How did you intend text encodings to work?  I see several possibilities:
> 
>    textDecode :: TextEncoding -> [Octet] -> [Char]
> 
> or
>   
>    decodeInputStream :: TextEncoding -> InputStream -> TextInputStream
>    getChar :: TextInputStream -> IO Char
>    etc.
> 
> or
>   
>    setInputStreamCoding :: InputStream -> TextEncoding -> IO ()
>    getChar :: InputStream -> IO Char

I was thinking of the second. It could easily be implemented as the third
under the hood. But I was hoping someone else would worry about it. :-)

> > > data Directory	-- abstract
> 
> I don't see a reason for changing the existing Directory support
> (System.Directory).  Could you give some motivation here?  Is the idea
> to abstract away from the syntax of pathnames on the platform (eg.
> directory separator characters)?  If so, I'm not sure it's worthwhile.
> There are lots of differences between pathname conventions: case
> sensitivity, arbitrary limits on the lengh of filenames, filename
> extensions, and so on.

Basically, the usual interface encourages programmers to treat pathnames
as file/directory identifiers, even though they aren't. This is the root
cause of a whole class of security vulnerabilities (not to mention some
everyday annoyances). I want to avoid those vulnerabilities in the Haskell
model by providing values that *really are* file and directory
identifiers. Pathnames have one good property: they're human-readable and
-writable. That's their only good property. Within an application, they
should be converted immediately to a more secure internal representation.
(And the conversion should be done exactly once -- any more and you're
opening yourself to security exploits.) This is why I really don't want to
use the File concept unless we can guarantee file-File identity. A system
that appears to be secure but actually isn't is even worse than one which
is obviously insecure.

This idea isn't complete unless the model also supports persistence of
File and Directory values, but I didn't even bother drawing up an API for
this because I'm sure it's impossible to implement. Any sane OS would
provide support for this, but I don't think any widespread OS does. There
should also be a DirectoryEntry type, but, again, I'm pretty sure that
this can't be implemented.

On reflection I think the (Directory, Maybe String) return value is a
mistake. The intent was to support creating a new file or directory by
pathname, but that's probably better done by functions like those you
propose below. (The return value was originally supposed to be
Either DirectoryEntry (Directory,String), which made more sense.)

> > > lookupFileByPathname :: String -> IO File
> 
> Here, I suggest we need
> 
>   lookupFileByPathname :: FilePath -> IOMode -> IO File

If so, it should be called something like "newFileAccessPath" or at least
"lookupFileAccessPath".

> > > lookupInputStreamByPathname :: String -> IO InputStream
> > >	-- at least as likely to succeed as lookupFileByPathname
> 
> and similarly
> 
>   createFileOutputStream :: FilePath -> IO OutputStream
>   appendFile :: FilePath -> IO OutputStream

Definitely.

-- Ben