Potential Network SIG

Simon Marlow marlowsd at gmail.com
Wed Aug 26 04:13:30 EDT 2009


On 25/08/2009 21:23, Johan Tibell wrote:
> On Tue, Aug 25, 2009 at 2:03 PM, Simon Marlow<marlowsd at gmail.com>  wrote:
>> On 22/08/2009 05:49, Thomas DuBuisson wrote:
>>> 3) Use Bytestrings (and have corrosponding .Lazy modules) for efficiency.
>>> As in network-bytestring, any new API should be performance concious
>>> enough to avoid String.
>>
>> Idealogically speaking, this is not a choice you should make in the network
>> library.  The network library should deal with setting up sockets, and
>> delegate the actual I/O to the I/O library.
>>
>> Right now, that means making Handles from Sockets (which is something the
>> current network library provides).  And then you use the bytestring library
>> to write bytestrings to the Handle.  In the future we'll have a way to write
>> text to a Handle too.
>>
>> Now, I wouldn't be surprised if this doesn't cover all the use cases. Maybe
>> people want to use the low-level send/recv.  But I expect that for most
>> applications, going via Handle will be the right thing, and we should look
>> at how to accommodate the other use cases.
>
> In my mind an improved I/O library would look something like this:
>
>> -- At the very bottom is a type class 'RawIO' which represents a
>> -- variety of stream-like types.
>> class RawIO a where
>>      readInto :: Ptr Word8 ->  Int ->  IO ()
>>      write :: ByteString ->  IO ()
>>
>>      read :: Int ->  IO ByteString
>>      read n = ByteString.createAndTrim n (\p ->  readInto p n)

This is quite similar to the class of the same name in the GHC I/O library:

-- | A low-level I/O provider where the data is bytes in memory.
class RawIO a where
   read                :: a -> Ptr Word8 -> Int -> IO Int
   readNonBlocking     :: a -> Ptr Word8 -> Int -> IO (Maybe Int)
   write               :: a -> Ptr Word8 -> Int -> IO ()
   writeNonBlocking    :: a -> Ptr Word8 -> Int -> IO Int

I think the Bytestring API should be a layer on top of this.

> This definition is very minimal and most likely need to be expanded
> with operations such as 'close' and perhaps also 'seek'.

close/seek etc. are methods of the IODevice class in GHC's IO library.

See http://darcs.haskell.org/packages/base/GHC/IO/Device.hs

We have implementations of these classes for file descriptors, and it is 
my intention to have other implementations too: memory-mapped files, 
Windows HANDLEs, Bytestring (for testing), and Chan Word8 (for testing 
again: you write to the Handle, and the decoded bytes come out of the Chan).

These APIs aren't currently "public", in the sense that they are 
exported by modules in the GHC.* hierarchy.  I hope they'll help as a 
concrete start to the discussion of where the I/O library should be 
going, though.

> We can now layer buffering on top.
>
>> -- Buffers for reading and writing are kept in a data type 'BufferedIO'.
>> -- This data type need not be exposed.
>> data BufferedIO = forall a. RawIO a =>  BufferedIO Buffer Buffer a
>>
>> instance RawIO BufferedIO where
>>      readInto = readFromBufferInto  -- Calls RawIO.readInto if needed
>>      write = writeToBuffer  -- Calls RawIO.write if needed
>>
>> -- Allocates buffers and returns a BufferedIO
>> buffered :: RawIO a =>  a ->  a
>> buffered = ...

This is where things get a bit hairy.  The upper layers often want to 
know about the buffer, for instance when it needs to be flushed, or for 
performance reasons - e.g. encoding/decoding needs to have direct access 
to both buffers.  So in GHC's I/O library buffering is a new class 
BufferedIO, consumed by the higher layer, and you can make a BufferedIO 
instance trivially given a RawIO instance.  Not everything has a RawIO 
instance though: memory-mapped files just appear as buffers.

Incedentally, there have been various designs around this theme in the 
past, e.g. http://www.haskell.org/haskellwiki/Library/Streams (with 
various problems IMO, but there are some good ideas there).

Cheers,
	Simon


More information about the Libraries mailing list