[Haskell-cafe] Re: iteratee: Do I need to roll my own?

David Leimbach leimy2k at gmail.com
Wed Mar 31 14:42:40 EDT 2010

First thanks for the reply,

On Wed, Mar 31, 2010 at 11:15 AM, Valery V. Vorotyntsev <valery.vv at gmail.com
> wrote:

> > I'm looking at iteratee as a way to replace my erroneous and really
> > inefficient lazy-IO-based backend for an expect like Monad DSL I've
> > been working for about 6 months or so now on and off.
> >
> > The problem is I want something like:
> >
> > expect "some String"
> > send "some response"
> >
> > to block or perhaps timeout, depending on the environment, looking for
> > "some String" on an input Handle, and it appears that iteratee works
> > in a very fixed block size.
> Actually, it doesn't. It works with what enumerator gives him.
> In case of `enum_fd'[1] this is a fixed block, but generally this is
> a ``value'' of some ``collection''[2].  And it is up to programmer to
> decide of what should become a value.
>  [1] http://okmij.org/ftp/Haskell/Iteratee/IterateeM.hs
>  [2] http://okmij.org/ftp/papers/LL3-collections-enumerators.txt

> > While a fixed block size is ok, if I can put back unused bytes into
> > the enumerator somehow (I may need to put a LOT back in some cases,
> > but in the common case I will not need to put any back as most
> > expect-like scripts typically catch the last few bytes of data sent
> > before the peer is blocked waiting for a response...)
> I don't quite get this ``last few bytes'' thing. Could you explain?

What I mean is let's say the stream has

"abcd efg abcd efg"

and then I run some kind of iteratee computation looking for "abcd"

and the block size was fixed to cause a read 1024 bytes, but returns as much
as it can providing it to the iteratee to deal with.  The iteratee, which I
want to implement Expect like behavior, would really only want to read up to
"abcd" consuming that from the input stream.  Does the iteratee get the
whole stream that was read by the enumerator, or is it supplied a single
atomic unit at a time, such as a character, in which I can halt the
consumption of the streamed data?

What I don't want to have happen is my consuming bytes from the input
Handle, only to have them ignored, as the second instance of "abcd" could be

I'm actually not sure that was very clear :-).   I don't want to throw out
bytes by accident if that's even possible.

My discomfort with Iteratee is that most Haskell texts really want you to go
the way of lazy IO, which has led me to a good bit of trouble, and I've
never seen a very comprehensive tutorial of Iteratee available anywhere.  I
am reading the Examples that come with the hackage package though.

> I was about writing that there is no problem with putting data back to
> Stream, and referring to head/peek functions...  But then I thought,
> that the ``not consuming bytes from stream'' approach may not work
> well in cases, when the number of bytes needed (by your function to
> accept/reject some rule) exceeds the size of underlying memory buffer
> (4K in current version of `iteratee' library[3]).
>  [3]
> http://hackage.haskell.org/packages/archive/iteratee/0.3.4/doc/html/src/Data-Iteratee-IO-Fd.html
> Do you think that abstracting to the level of _tokens_ - instead of
> bytes - could help here? (Think of flex and bison.)  You know, these
> enumerators/iteratees things can be layered into
> _enumeratees_[1][4]... It's just an idea.
Now that's an interesting idea, and sort of where my previous confusing
answer seemed to be heading.  I wasn't sure if the iteratee was provided a
byte, a char, or a token.  If I can tell the enumerator to only send tokens
to the iteratee, (which I'd have to define), then perhaps I can ignore the
amount consumed per read, and deal with let the enumerator deal with that
buffering issue directly.  Perhaps that's how iteratee really works anyway!

>  [4] http://ianen.org/articles/understanding-iteratees/
> > Otherwise, I'm going to want to roll my own iteratee style library
> > where I have to say "NotDone howMuchMoreIThinkINeed" so I don't over
> > consume the input stream.
> What's the problem with over-consuming a stream? In your case?

Well my concern is if it's read from the input stream, and then not used,
the next time I access it, I'm not certain what's happened to the buffer.
 However I suppose it's really a 2-level situation where the enumerator
pulls out some fixed chunk from a Handle or FD or what have you, and then
folds the iteratee over the buffer in some sized chunk.

In C++ I've used ideas like this example that a professor I had in college
showed me from a newsgroup he helped to moderate.

int main () {
std::cout << "Word count on stdin: " <<
std::istream_iterator<std::string>()) << std::endl;

If the code were changed to be:

int main () {
std::cout << "Character count on stdin: " <<
std::istreambuf_iterator<char>()) << std::endl;

We get different behavior out of the upper level distance algorithm due to
the kind of iterator, while distance does a form of folding over the
iterators, but it's actually doing the accumulation of a count at the
enumerator level rather than having the iterator evaluate it.  Iteratee
seems to be like this but with this inversion of control.

Note in the C++ example, changing the properties of each chunk being
iterated over changes the result of the folding from  "word counter" to
"character counter".

I guess I need to just get familiar with Iteratee to understand what knobs I
have available to turn.

> BTW, this `NotDone' is just a ``control message'' to the chunk
> producer (an enumerator):
>    IE_cont k (Just (GimmeThatManyBytes n))

Yes, I was thinking of something like that.

> > Does that even make any sense?  I'm kind of brainstorming in this
> > email unfortunately :-)
> What's the problem with brainstorming? :)

> Cheers.
> --
> vvv
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100331/c000a3fe/attachment.html

More information about the Haskell-Cafe mailing list