[Haskell-cafe] interesting IO oriented code
robert dockins
robdockins at fastmail.fm
Thu Mar 17 10:31:20 EST 2005
I while back following the most recent discussion about filepaths and IO
generally, I decided to pick up the torch and try my hand at a solution
that delt with all the issues. My conception of the problem can be
found here:
http://www.haskell.org//pipermail/haskell-cafe/2005-January/008955.html
However, I quickly got sidetracked by the character encoding issue, and
sequential IO operations generally. I have come up with some code that
I think is interesting, not least of which because it is somewhat
performant. In the spirit of 'release early, release often' I am now
making it available.
I ended up developing a sequential IO API and a couple of test cases to
drive it. I currently have a simple word count and a md5
implementation. There is a darcs repo at:
http://www.eecs.tufts.edu/~rdocki01/filepath/
The code relies on multi-parameter typeclasses, fundeps and unboxed
tuples, so its GHC only.
The good news:
1) The API is semi-manageable. It is based around Producer, Transformer
and Consumer functors (at least I think they can be called functors; I'm
not real knowledgeable here). I think that with a little more work (and
the input of people more experienced at this stuff than I) it could be
sugared into a pretty usable API.
2) It is performant (mostly). At least it outperforms other Haskell IO
methods I have tried. My 'wc' is about twice as fast as the current
shootout version in informal tests (the shootout code is included in the
repo). My md5 can sum somewhere between 2-4Mb/Sec on my hardware.
3) Very importantly, the code appears to have reliable constant-space
behavior.
4) We are not artificially forced into the IO or ST monads for
performance reasons. Explicit state passing is used where necessary,
which seems to have the added benefit of helping the compiler to find
good optimizations (pure speculation).
The bad news:
1) Mostly performant is still not great. My wc takes about 6 times as
long as the C version on my machine, and md5 takes about about 80 times
as long. (interestingly, the C md5sum takes about 1/5 of the time as C
wc!). However, it is within an order of magnitude of a java
implementation of md5 (using the standard digest classes).
2) The performance is pretty fragile. Small changes can cause large
performance hits for no easily discernible reason. This probably
relates to the way particular optimizations do or don't get applied.
The situation is hugely complicated by the typeclass-heavy API, I am sure.
3) The stream-observer paradigm is a somewhat difficult programming
environment.
My next step is to try some actual character encoding implementations
(the original purpose after all) and see how that goes. I'd also like
to try gzip and gunzip transformer layers.
Any ideas for improvements (including patches!) are welcome.
Robert Dockins
PS the code currently includes a number of vestigial remnants of false
starts, and is generally kind of ugly; you are warned.
More information about the Haskell-Cafe
mailing list