[Haskell-cafe] ANNOUNCE: iterIO-0.1 - iteratee-based IO with pipe operators

Fri May 6 17:56:40 CEST 2011

At Fri, 6 May 2011 10:15:50 +0200,
Gregory Collins wrote:
> 
> Hi David,
> 
> Re: this comment from catchI:
> 
> > It is not possible to catch asynchronous exceptions, such as
> > lazily evaluated divide-by-zero errors, the throw function, or
> > exceptions raised by other threads using throwTo if those
> > exceptions might arrive anywhere outside of a liftIO call.
> 
> It might be worth investigating providing a version which can catch
> asynchronous exceptions if the underlying monad supports it (via
> MonadCatchIO or something similar). One of the most interesting
> advantages I can see for IterIO over the other iteratee
> implementations is that you actually have some control over resource
> usage -- not being able to catch asynchronous exceptions nullifies
> much of that advantage. A clear use case for this is timeouts on
> server threads, where you typically throw a TimeoutException exception
> to the handling thread using "throwTo" if the timeout is exceeded.

Excellent point.  There's actually a chance that iterIO already
catches those kinds of exceptions, but I wasn't sure enough about how
the Haskell runtime works to make that claim.  I've noticed in
practice that asynchronous exceptions tend to come exactly when I
execute the IO >>= operation.  If that's true, then since each IO >>=
is wrapped in a try block, the exceptions will all be caught (well,
not divide by zero, but things like throwTo, which I think are more
important).

One way I was thinking of implementing this was wrapping the whole
execution in block, and then calling unblock (unless iterIO's own
hypothetical block function is called) for every invocation of liftIO.
Unfortunately, the block and unblock functions now seem to be
deprecated, and the replacement mask/unmask ones would not be as
amenable to this technique.

However, if there's some simpler way to guarantee that >>= is the
point where exceptions are thrown (and might be the case for GHC in
practice), then I basically only need to update the docs.  If someone
with more GHC understanding could explain how asynchronous exceptions
work, I'd love to hear it...

> Another question re: resource cleanup: in the docs I see:
> 
> > Now suppose inumHttpBody fails (most likely because it receives an
> > EOF before reading the number of bytes specified in the
> > Content-Length header). Because inumHttpBody is fused to handler,
> > the failure will cause handler to receive an EOF, which will cause
> > foldForm to fail, which will cause handleI to receive an EOF and
> > return, which will ensure hClose runs and the file handle h is not
> > leaked.
> 
> > Once the EOFs have been processed, the exception will propagate
> > upwards making inumHttpServer fail, which in turn will send an EOF
> > to iter. Then the exception will cause enum to fail, after which
> > sock will be closed. In summary, despite the complex structure of
> > the web server, because all the components are fused together with
> > pipe operators, corner cases like this just work with no need to
> > worry about leaked file descriptors.
> 
> Could you go into a little bit of detail about the mechanism behind this?

Yes, absolutely.  This relies on the fact that an Inum must always
return its target Iter, even when the Inum fails.  This invariant is
ensured by the two Inum construction functions, mkInumC and mkInumM,
which catch exceptions thrown by the "codec" iteratee and add in the
state of the target iteratee.

Now when you execute code like "inum .| iter", the immediate result of
running inum is "IterR tIn m (IterR tOut m a)"--i.e., the result of an
iteratee returning the result an iteratee (because Inums are
iteratees, too).  If the Inum failed, then the outer IterR will use
the Fail constructor:

        Fail !IterFail !(Maybe a) !(Maybe (Chunk t))

Where the "Maybe a" will be a "Maybe (IterR tOut m b)", and, because
of the Inum invariant, will be Just an actual result.  .| then must
translate the inner iteratee result to the appropriate return type for
the Inum (since the Inum's type (IterR tIn m ...) is different from
the Iter's (Iter tOut m ...)).  This happens through the internal
function joinR, which says:

joinR (Fail e (Just i) c) = flip onDoneR (runR i) $ \r ->
                            case r of
                              Done a _    -> Fail e (Just a) c
                              Fail e' a _ -> Fail e' a c
                              _ -> error "joinR"

Where the 'runR' function basically keeps feeding EOF to an Iter (and
executing it's monadic actions and rejecting its control requests)
until it returns a result, at which point the result's residual input
can be discarded and replaced with the residual input of the Inum.

David