Raw I/O library proposal, second (more pragmatic) draft

Fri, 1 Aug 2003 14:47:42 -0700 (PDT)

On Fri, 1 Aug 2003, Simon Marlow wrote:

> I wanted to float a generalisation of this scheme, though.  I'm
> wondering whether it might be a good idea to make InputStream and
> OutputStream into type classes, the advantage being that this makes
> streams more extensible - one example is that memory-mapped files fit
> neatly into this framework.  I already have 6 examples of things that
> can have streams layered on top (or *are* streams), and there are almost
> certainly more.

I think this is unambiguously superior to my design because it's
user-extensible. I can easily imagine a user wanting to put a text reader
on top of a user-defined instance of InputStream, for example. It also
allows particular kinds of streams to expose additional structure, which
is good.

My only concern is that the additional structure might not be known at
type-check time. In particular, the lookupXputStream functions can't
return any particular type of stream, as far as I can tell -- certainly
not a FileXputStream.

> Here's some signatures for you to peruse:
> 
> class Stream s where
>       closeStream	   :: s -> IO ()

I guess "open" and "close" do make sense for streams.

>       streamSetBuffering :: s -> BufferMode -> IO ()

This is not a design issue, but not all kinds of buffering make sense for
all kinds of streams (line buffering doesn't seem sensible for file
streams, and any buffering on a memory array is pointless). The supplied
buffering should presumably be only a suggestion.

>       streamGetBuffering :: s -> IO BufferMode
>       streamFlush	   :: s -> IO ()

Does streamFlush make sense for input streams? In the case of a file
stream it could discard buffered data, but for other streams I'm not sure
what it would do.

>       isEOS		   :: s -> IO Bool

This has a clear meaning for input streams (no more data), but for output
streams it could mean many different things (connection closed by
listener, no more disk space, no more memory buffer space), and, more
seriously, these conditions can't in general be detected synchronously
unless the stream happens to be unbuffered.

> class InputStream s where
>       streamGet         :: s -> IO Word8
>       streamReadBuffer  :: s -> Integer -> Buffer -> IO ()

I used "read" and "write" exclusively for files and "get" and "put"
exclusively for streams to emphasize that these are completely different
operations. Writing a file is like writing on a piece of paper; you know
where your data is going and how to get it back with a read. But output
streams are like pneumatic tubes that whisk your octets away to parts
unknown. I would even go so far as to use names like push/pull or
send/receive or speak/listen for streams.

>       streamReadBuffer  :: s -> Integer -> Buffer -> IO ()
>       streamGetBuffer   :: s -> Integer -> IO ImmutableBuffer

This brings up (again) an important issue: what's the most practical way
of providing a memory buffer for file/stream operations? There doesn't
seem to be a clean answer to this in Haskell. It seems like we'll need
more variants than just these two.

[snip]

> data MappedFileInputStream	-- instance Stream, InputStream
> data MappedFileOutputStream	-- instance Stream, OutputStream

I don't think these are necessary; you can use ArrayXputStream.

[snip]

> -- Pipes
> data Pipe  -- a pipe with a read and a write end
> instance Stream Pipe
> instance InputStream Pipe
> instance OutputStream Pipe
> createPipe	 :: IO Pipe
> closePipe	 :: Pipe -> IO ()

I strongly believe that createPipe should return an
(InputStream,OutputStream) pair, not a single object supporting both
interfaces. The streams associated with a pipe represent the ends of the
pipe, not the pipe itself. This is true conceptually and also in practice:
pipes are only useful if you separate the two ends and give them to two
different threads.

> -- Sockets:
> data Socket
> instance Stream Socket
> instance InputStream Socket
> instance OutputStream Socket

Same objection here, although the reason is a bit different. Each TCP
connection consists of two independent unidirectional channels; they're
only created together for reasons of efficiency (and security?). There are
a total of four ends, of which you get two and the remote host gets the
other two. I admit that in this case a natural analogy with a telephone
handset suggests that the two streams should be kept together; but that's
what tuples are for.

The only object I can think of that could legitimately be an instance of
both InputStream and OutputStream is a LIFO buffer, assuming there's any
use for such a thing.

-- Ben