[Haskell] installing streams library

Bulat Ziganshin bulat.ziganshin at gmail.com
Wed May 24 08:19:35 EDT 2006


Hello Chad,

Saturday, May 20, 2006, 11:00:48 AM, you wrote:

> Is there any indication what fast IO approach might work its way into
> the standard libraries? It would be nice for idiomatic Haskell to be
> really fast by default, and I'd love to be able to show off the language
> shootout implications to coworkers.

first, about the speed. i've just implemented ByteString I/O and it
shows speed of 20-100 mb/s on my box. so, as i've previously said,
the speed of text I/O using ByteStrings+Streams will be comparable to
disk speed itself. that means that my "quest for text I/O speed" now
finished :)

second - i developed Streams to be usable both as add-on lib for
existing hugs/ghc versions and as future part of base libraries what will
replace current Handle-based I/O. in particular, i will include in new
version module what emulates System.IO types and functions. afaik,
Simon Marlow supports my plans. but the current version still lacks
some functionality that System.IO provides:

network I/O via sockets
overlapping I/O in multi-threaded programs
non-blockable I/O
correctness with asynchronous interrupts and 'trace (putStr ..)'
LineBuffering/NoBuffering
large number of functions, such as hUngetChar

the last three things just needs my attention, i know how to do it.
the first three things, though, are from areas where i don't have
any experience. so i think that i now should publish library in it's
current state and interest other developers to work on it. as you
understand, we can't say about replacing System.IO without full
implementation of it's features and it's no good to have two I/O
libraries in the base package :)


and third that i want to say is what Streams isn't only "fast I/O
library", it's general purpose I/O library that includes support for
binary serialization, Char encoding, I/O to memory buffers and strings,
memory-mapped files and so on, so on. i think that its main selling
points is modularity and extensibility, that greatly simplifies
addition of new features, maintenance and improves code readability.
i believe that everyone is able to understand library structure and
add new features what one needs or replace old modules with better new ones



in February i wrote to Andrew Pimlott about this library goals:

Wednesday, February 08, 2006, 8:24:59 PM, you wrote:

>> AP> Bulat, it wouldn't hurt to include a motivation section at the top.  As
>> AP> I understand, it's ultimately all about speed, right?  Otherwise, we
>> AP> would all be happy with lists (and unsafeInterleave*).  So maybe a
>> AP> comparison between Stream and [] should be given.
>> 
>> you guessed wrong :)  this library is ultimately about replacing
>> System.IO library (i.e. Handles)

AP> Let me rephrase my question:  Why not just reimplement the Handles API
AP> (with some extensions like binary IO)?  Is there really a need to use a
AP> handle-like API for more than real IO?  If so, what is the need,
AP> expressivity or performance (or both)?  Maybe a use case showing what
AP> you can do with your library, and how you would have to do it otherwise?

now i understood you. actually, my presentation was meant only for a
few people who are already know about limitations of current library,
who are already requested additional features but don't got it. here i
need to give some history:

when a System.IO interface was developed, it implements much less
features than now, and its implementation was enough simple and
straightforward. as time goes, the more and more features was added to
this library: complex buffering scheme, several async i/o
implementations, locking, networking. And at current moment, GHC's
System.IO implementation consists of about 3000 lines with a rather
monolithic structure. you can't easily add new feature or make using
of some "heavyweight" feature optional because it needs to make
changes through the entire library. As the result, GHC users can't get
improvements in this library, required for his work. Some of them even
develop his own libraries what implements just that they need. for
example, Einar Karttunen developed networking library with advanced
async i/o and support for i/o of fast packed strings. But such
solutions is not universal - his library can't be used for file i/o,
fo example, although the code is essentially the same.

what i done? the main merit of Streams library is not implementation
of any particular feature, but birth of framework for the I/O
sub-libraries. and my library essentially is just a collection of such
sublibs. first, for example, implements file i/o, second implements
buffering, third - utf-8 encoding, and so on. the most important
property of all these sublibs is that no one of them is greater than
300 lines. that means that it is far easier to understand, modify or
even replace any of them. and that will have no impact to other part
of library because all these sublibs binded together not via data
fields, but with well defined interfaces

now, implementing any new I/O feature or new I/O source means only
implementing Stream class-comforming interface - all other features,
including locking, buffering, encoding, compression, serialization,
binary and text i/o, async i/o, will become available automatically.
the same for transformers - once implemented gzip compression or
UTF-16 encoding support will become automatically available for all
the I/O sources, present and future. is not that great? :)  moreover,
user apllication or third party lib can easily implement new stream
types or transformers without bothering the original library.

so, this lib in some aspect is meta-meta-instrument, whose capital
will be automatically increasing as new sublibs will appear. just at
this moment its advantages over System.IO is in the following areas:

faster i/o
support for optional utf-8 encoding
binary i/o and serialization
user-controlled locking

if you will look inside the archive, you will find directory Examples,
which demonstrates usage of all these features. as i said in docs, i
also plan to implement other user requests to the System.IO library

another consequence of emerging this library is that all these
features will become available on Hugs and other Haskell compilers,
that never had enough man resources to develop such great library as
GHC's System.IO.

and about using streams in monads other than IO. i really don't know,
whether it will be used or not. at least, for seriazliation it looks
promising. for example, there is functions "encode" and "decode" that
is like show/read pair, but implements binary encoding according to
the instances of Binary class. and of course, it is implemented
through the StringBuffer instance of Stream class, working in the ST
monad.

comparing to the Handles, library provides essentially the same
interface. again, you can find information about swithcing from
Handles to Streams in doc. i plan to provide in future "legacy layer"
which will emulate 100% of System.IO interface, but use the streams
internally. It will be essential for old apps, especially if Streams
will become official and sole GHC i/o library

about internal organization - Streams is somewhat like that if Simon
himself splitted up Handles library to the small independent parts and
then replaced part of them with simpler/faster implementations.
nothing more, except for common Stream class interface, developed by
John Goerzen. my work was mainly to bring the best ideas together :)



-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com



More information about the Haskell mailing list