[Haskell-cafe] I/O interface

Ben Rudiak-Gould Benjamin.Rudiak-Gould at cl.cam.ac.uk
Mon Jan 17 19:27:26 EST 2005


Marcin 'Qrczak' Kowalczyk wrote:

 >Convenience. I'm worried that it uses separate types for various
 >kinds of streams: files, pipes, arrays (private memory), and sockets.
 >Haskell is statically typed and lacks subsumption. This means that
 >even though streams are unified by using a class, code which uses
 >a stream of an unknown kind must be either polymorphic or use
 >existential quantification.

Yes, this is a problem. In my original proposal InputStream and 
OutputStream were types, but I enthusiastically embraced Simon M's idea 
of turning them into classes. As you say, it's not without its 
disadvantages.

I see several possibilities here.

    * We could adopt Avery Lee's suggestion (from the discussion in 
2003) to use field labels instead of methods. Advantages: InputStream 
and OutputStream behave more like their OOP equivalents, with no loss of 
extensibility. Disadvantages: potentially less efficient (no 
specialization possible); loses some static type information.

    * We could use a single type for all input and output streams in the 
standard library, but retain the type classes also.

    * We could provide existential wrappers:

          data IStream = (InputStream a) => MkIStream !a
          instance InputStream IStream where ...

A nice thing about the last approach is that it supports dynamic 
downcasting:

    case (x :: IStream) of
      MkIStream x ->
        case (Data.Dynamic.cast x :: UArrayInputStream) of
          Just x -> (getUArray x, getCurrentIndex x)
          Nothing -> ...

 >Completeness. Unless File{Input,Output}Stream uses {read,write}()
 >rather than file{Read,Write}, openFile provides only a subset of
 >the functionality of open(): it works only with seekable files,
 >e.g. not with "/dev/tty".
 >
 >What is the type of stdin/stdout? They may be devices or pipes
 >(not seekable), regular files (seekable), sockets...

Simon M's current interface is incomplete, but the concept is fine.

Again, to try to avoid confusion, what you call a "seekable file" the 
library calls a "file", and what you call a "file" I would call a "Posix 
filehandle". Roughly. It's hard to be precise because "file" is such a 
heavily overloaded term. (For example, is "/dev/tty" a file? Is the 
(major,minor) device number it might correspond to on a particular 
filesystem at a particular moment a file? Is the integer that's returned 
from open("/dev/tty", ...) a file? Is the tty device itself a file? I 
think you've used "file" in all four senses.)

When I talk about a stream, I mean one end of a unidirectional pneumatic 
tube. If it's the ingoing end, you stick some data in the tube and it's 
carried away. If it's the outgoing end, you wait for some data to arrive 
and then take it. Tubes all look the same. No pneumatic tube is a 
storage device, but you may happen to know that it leads to a Frobozz 
Magic Storage Device at the other end.

By the same token, stdin is never a file, but the data which appears 
through stdin may ultimately be coming from a file, and it's sometimes 
useful, in that case, to bypass stdin and access the file directly. The 
way to handle this is to have a separate stdinFile :: Maybe File.

As for openFile: in the context of a certain filesystem at a certain 
time, a certain pathname may refer to

  * Nothing
  * A directory
  * A file (in the library sense); this might include things like 
/dev/hda and /dev/kmem
  * Both ends of a (named) pipe
  * A data source and a data sink which are related in some qualitative 
way (for example, keyboard and screen, or stdin and stdout)
  * A data source only
  * A data sink only
  * ...

How to provide an interface to this zoo?

The dynamic-typing approach is to return some sort of Thing with a 
complicated interface which is approximately the union of the interfaces 
for each thing in the above list. Unsupported methods fail when called. 
This is roughly what Posix does, except that directories are a special 
case, and Nothing is very special (as perhaps it should be, but I'm not 
sure).

The Haskell approach is, I guess, to use an algebraic datatype, e.g.

    data FilesystemObject
      = Directory Directory
      | File File
      | InputOutput PosixInputStream PosixOutputStream
      | Input PosixInputStream
      | Output PosixOutputStream

Here I'm using "Posix*Stream" for all streams backed by Posix 
filehandles. I'm unsure whether NoSuchPath should be in there too.

You might say that this is annoyingly complicated. My first reaction is 
"tough--it's exactly as complicated as the reality it models". But there 
should presumably be helper functions of types FilesystemObject->IStream 
and FilesystemObject->OStream.

The other complication is that Posix makes you specify access rights 
when you look up a path in the filesystem. This makes no sense, but it's 
something we have to live with.

So I'd argue for replacing openFile with something like

    data FilesystemObject = ...

    openPath :: FilePath -> IOMode -> IO FilesystemObject

    filesystemInputStream :: FilesystemObject -> (IO?) IStream

    data OutputMode = Append | Overstrike | Replace
    filesystemOutputStream :: FilesystemObject -> OutputMode -> (IO?) 
OStream

 >Note that even when they are regular files, emulating stream I/O
 >in terms of either pread/pwrite or mmap does not yield the correct
 >semantics of sharing the file pointer between processes.

You're right, and the solution is to have two kinds of file I/O streams, 
one based on File (File*Stream) and one based on the Posix file pointer 
(Posix*Stream).

-- Ben



More information about the Haskell-Cafe mailing list