[Haskell-cafe] ANNOUNCE: iterIO-0.1 - iteratee-based IO with pipe operators

Wed May 11 08:07:12 CEST 2011

At Mon, 9 May 2011 17:55:17 +0100,
John Lato wrote:
> 
>     Felipe Almeida Lessa wrote:
>    
>     > So, in the enumerator vs. iterIO challenge, the only big differences I
>     see are:
>     >
>     >  a) iterIO has a different exception handling mechanism.
>     >  b) iterIO can have pure iteratees that don't touch the monad.
>     >  c) iterIO's iteratees can send control messages to ther enumerators.
>     >  d) iterIO's enumerators are enumeratees, but enumerator's enumerators
>     > are simpler.
>     >  e) enumerator has fewer dependencies.
>     >  f) enumerator uses conventional nomenclature.
>     >  g) enumerator is Haskell 98, while iterIO needs many extensions (e.g.
>     > MPTC and functional dependencies).
>     >
>    
> 'a' is important, but I think a lot of people underestimate the
> value of 'c', which is why a control system was implemented in
> 'iteratee'. ...  it's relatively simple for an enumerator to return
> data to an iteratee using an IORef for example.

Would you just embed IORefs for the result into an Exception type?
That's actually a pretty simple solution when you can do it.  It's a
bit harder for my setting, because I'm using this stuff in support of
a research project that doesn't make the IO Monad available to most
code.  I'd like to write Inums/Enumeratees that work with both the IO
Monad and my own weird monads.  This is admittedly a fringe problem,
so IORef is probably fine for most settings.  But if there's any
possible way you could do it with STRefs, that would be really cool...

After further thought, though, I'm still not 100% satisfied with
iterIO's control mechanism.  Someone earlier in this thread pointed
out that my SSL module doesn't support STARTTLS particularly
conveniently.  I read that and decided to go add a function to make
STARTTLS really convenient.  What I came up with ended up using MVars
to communicate the switch from the enumerator to the iteratee and was
ugly enough that I did not commit it.

What you really want is the ability to send both upstream and
downstream control messages.  Right now, I'd say iterIO has better
support for upstream control messages, while iteratee has better
support for downstream messages, since iteratee can just embed an
Exception in a Stream.  (I'm assuming you could have something like a
'Flush' exception to cause output to be flushed by an Iteratee that
was for some reason buffering some.)

I'm curious how this works in practice, though.  What is the
convention for Enumeratees receiving Exceptions they don't know about
in the Stream?  Are they supposed to throw the exceptions upwards
(which wouldn't help), or propagate them downwards.  And how do they
synchronize exceptions?  Suppose you have a pipeline with an
Enumeratee transcoding utf8 bytes to Chars, and another implementing
text compression or something that requires buffering:

   ByteString  +--------------+   [Char]    +----------+  [Char]
   ----------> | UTF8-DECODER | ----------> |  BUFFER  | -------->
               +--------------+             +----------+

Now say a Stream with EOF (Just Flush) arrives at the UTF8-DECODER in
the middle of a multi-byte character.  Do you defer the Flush until
the character is complete, or let it skip ahead to the end of the
previous character and immediately send it to the next state?  Or,
worse, propagate it back up as an uncaught exception?

>  And adding support to keep track of the stream position would be a
> pretty simple (and possibly desirable) change.

Can you explain how iteratee could keep track of the stream position?
I'm not saying it's impossible, just that it's a challenging puzzle to
make the types come out and I'd love to see the solution.  Somehow you
would need to pass the onCont continuation to itself to preserve it,
and then type a gets in the way because it's possibly no longer the
right type.  In other words, you could try something like:

{-# LANGUAGE Rank2Types #-}

data Iteratee s m a =
    Iteratee (forall r. (a -> Stream s -> m r) -> OnCont s m a r -> m r)

data OnCont s m a r =
    OnCont (OnCont s m a r -> (Stream s -> Iteratee s m a)
                           -> Maybe SomeException -> m r)

But now I have no way of using or unpacking the OnCont in an Iteratee
that doesn't return type a, and in general a control handler has no
idea what type the iteratee that threw the exception has--it's in fact
likely a different type from whatever enclosing function is wrapped by
a catch call.  Even if you do solve the type problem, another problem
is that you don't know how many times you need to call the
continuation function before you stop getting buffered data and start
actually causing IO to happen.

Part of the reason iterIO doesn't have this problem is that iterIO's
Chunk structure (which is vaguely equivalent to iteratee's Stream) is
a Monoid, so it's really easy to save up multiple chunks of residual
and "ungotten" data.  Every Iter is passed all buffered data of its
input type in its entirety (and the inner pipeline stages can actually
un-transcode data to make this true across data types).  But that's
also what makes downstream control messages are harder, because
there's no way to represent exceptions at particular points in the
input stream, just an EOF marker at the very end.

> I like the MonadTrans implementation a lot...

Thanks,
David