[Haskell-cafe] Re: Optimizing locking with MVars

Wed May 3 07:07:19 EDT 2006

This is interesting, thanks.

I propose to add INLINE pragmas to withMVar and friends.

Having an interface for simple locks sounds like a good idea to me. 
Would you like to send a patch?

This won't affect Handle I/O unfortunately, because we need block to 
protect against asynchronous exceptions.  I'm still not certain you 
won't need that in the stream library, too: check any stateful code (eg. 
buffering) and imagine what happens if an exception is raised at an 
arbitrary point.

Cheers,
	Simon

Bulat Ziganshin wrote:

> Main reason of slowness of existing Handle-based I/O in GHC is locking
> around each operation. it is especially bad for simple char-at-a-time
> I/O where 99% of time spent on locking and unlocking.
> 
> To be exact, on my CPU, hPutChar for 100mb file requires 150 seconds,
> while hGetChar for the same file is "only" 100 seconds long. it seems
> that former use 3 locking operations and later 2 ones, because my own
> vGetChar/vPutChar implementations both requires 52 seconds, of those
> only about one second is real work and rest is just `withMVar`
> expenses.
> 
> Until now, i thought that this 0.5 ms (about 1000 primitive CPU
> operations) on each withMVar is pure time required to perform
> takeMVar+putMVar operations. But yesterday i investigated this problem
> deeper and results was amazing!
> 
> First, i just made local copy of `withMVar` and added INLINE to it:
> 
> import Control.Exception as Exception
> {-# INLINE inlinedWithMVar #-}
> inlinedWithMVar :: MVar a -> (a -> IO b) -> IO b
> inlinedWithMVar m io =
>   block $ do
>     a <- takeMVar m
>     b <- Exception.catch (unblock (io a))
>             (\e -> do putMVar m a; throw e)
>     putMVar m a
>     return b
> 
> 
> Second, i've developed my own simplified version of this procedure.
> Here i should say that my library uses "MVar ()" field to hold lock
> and separate immutable data field with actual data locked:
> 
> data WithLocking h = WithLocking h !(MVar ())
> 
> This allowed me to omit block/unblock operation and develop the
> following faster analog of withMVar:
> 
> lock (WithLocking h mvar) action = do
>       Exception.catch (do takeMVar mvar
>                            result <- action h                   
>                            putMVar mvar ()                      
>                            return res                           
>                       )                                         
>                       (\e -> do tryPutMVar mvar (); throw e)    
> 
> And as third variant i tried exception-unsafe variant of `withMVar`:
> 
> unsafeWithMVar :: MVar a -> (a -> IO b) -> IO b
> unsafeWithMVar m io = do
>     a <- takeMVar m
>     b <- io a
>     putMVar m a
>     return b
> 
> 
> And now are results:
> 
> withMVar        52 seconds
> inlinedWithMVar 38 seconds
> lock            20 seconds
> unsafeWithMVar  10 seconds
> 
> 
> So,
> 
> 1) `withMVar` can be made significantly faster just by attaching
> INLINE pragma to it. until GHC includes this patch, you can just make
> local copy of this procedure (it's implementation is
> compiler-independent) and use INLINE pragma for this local copy
> 
> 2) if MVar is used only to protect some immutable data from
> simultaneous access, it's use can be made significantly faster by
> using above-mentioned WithLocking type constructor together with
> 'lock' function. I hope that this mechanism will go into future
> Haskell implementations and in particular it will be used in my own
> Streams library and in new DiffArray implementation (that is a part
> of ArrayRef library)
> 
> 3) For simple programs that don't catch exceptions anyway, this
> excessive protection is just meaningless. they can use
> 'unsafeWithMVar' to work as fast as possible. i mean in particular
> shootout-like benchmarks. it is also possible to develop fast & safe
> routines by using explicit unlocking (with 'tryPutMVar') in
> higher-level exception handlers
> 
> 
> and a more general conclusion. this case is a good demonstration of
> significant performance loss due to using of higher-order functions. i
> think that more aggressive inlining of high-order and polymorphic
> functions should significantly speed up GHC-compiled programs.
> 
>