[Haskell-cafe] ANNOUNCE: iterIO-0.1 - iteratee-based IO with pipe operators

Thu May 12 10:57:13 CEST 2011

On 11/05/2011 23:57, dm-list-haskell-cafe at scs.stanford.edu wrote:
> At Wed, 11 May 2011 13:02:21 +0100,
> Simon Marlow wrote:
>>
>>> However, if there's some simpler way to guarantee that>>= is the
>>> point where exceptions are thrown (and might be the case for GHC in
>>> practice), then I basically only need to update the docs.  If someone
>>> with more GHC understanding could explain how asynchronous exceptions
>>> work, I'd love to hear it...
>>
>> There's no guarantee of the form that you mention - asynchronous
>> exceptions can occur anywhere.  However, there might be a way to do what
>> you want (disclaimer: I haven't looked at the implementation of iterIO).
>>
>> Control.Exception will have a new operation in 7.2.1:
>>
>>     allowInterrupt :: IO ()
>>     allowInterrupt = unsafeUnmask $ return ()
>>
>> which allows an asynchronous exception to be thrown inside mask (until
>> 7.2.1 you can define it yourself, unsafeUnmask comes from GHC.IO).
>
> So to answer my own question from earlier, I did a bit of
> benchmarking, and it seems that on my machine (a 2.4 GHz Intel Xeon
> 3060, running linux 2.6.38), I get the following costs:
>
>       9 ns - return () :: IO ()       -- baseline (meaningless in itself)
>      13 ns - unsafeUnmask $ return () -- with interrupts enabled
>      18 ns - unsafeUnmask $ return () -- inside a mask_
>
>      13 ns - ffi                      -- a null FFI call (getpid cached by libc)
>      18 ns - unsafeUnmask ffi         -- with interrupts enabled
>      22 ns - unsafeUnmask ffi         -- inside a mask_

Those are lower than I was expecting, but look plausible.  There's room 
for improvement too (by inlining some or all of unsafeUnmask#).

However, the general case of unsafeUnmask E, where E is something more 
complex than return (), will be more expensive because a new closure for 
E has to be created.  e.g. try "return x" instead of "return ()", and 
try to make sure that the closure has to be created once per 
unsafeUnmask, not lifted out and shared.

>     131 ns - syscall                  -- getppid through FFI
>     135 ns - unsafeUnmask syscall     -- with interrupts enabled
>     140 ns - unsafeUnmask syscall     -- inside a mask_

> So it seems that the cost of calling unsafeUnmask inside every liftIO
> would be about 22 cycles per liftIO invocation, which seems eminently
> reasonable.  You could then safely run your whole program inside a big
> mask_ and not worry about exceptions happening between>>=
> invocations.  Though truly compute-intensive workloads could have
> issues, the kind of applications targeted by iterIO will spend most of
> their time doing I/O, so this shouldn't be an issue.
>
> Better yet, for programs that don't use asynchronous exceptions, if
> you don't put your whole program inside a mask_, the cost drops
> roughly in half.  It's hard to imagine any real application whose
> performance would take a significant hit because of an extra 11 cycles
> per liftIO.
>
> Is there anything I'm missing?  For instance, my machine only has one
> CPU, and the tests all ran with one thread.  Does
> unmaskAsyncExceptions# acquire a spinlock that could lock the memory
> bus?  Or is there some other reason unsafeUnmask could become
> expensive on NUMA machines, or in the presence of concurrency?

There are no locks here, thanks to the message-passing implementation we 
use for throwTo between processors.  unmaskeAsyncExceptions# basically 
pushes a small stack frame, twiddles a couple of bits in the thread 
state, and checks a word in the thread state to see whether any 
exceptions are pending.  The stack frame untwiddles the bits again and 
returns.

Cheers,
	Simon