[Haskell-cafe] STM unperformance

Thu Feb 18 15:00:59 UTC 2016

On 02/18/2016 03:25 PM, Wojtek Narczyński wrote:
> Dear list,
> 
> I create four haskell threads, each performs disjoint STM transactions.
> The more system threads I add, the slower the program runs. With four
> system threads the performance is 14% of the nonparallel run.
> 
> What is going on?
> 
> $ ghc -v
> Glasgow Haskell Compiler, Version 7.10.3, stage 2 booted by GHC version
> 7.8.4
> 
> $ ghc -O2 -threaded test-stm.hs
> 
> $ time ./test-stm +RTS -N1
> real    1.843s
> 
> $ time ./test-stm +RTS -N2
> real    7.469s
> 
> $ time ./test-stm +RTS -N3
> real    9.640s
> 
> $ time ./test-stm +RTS -N4
> real    13.144s
> 
> $ cat test-stm.hs
> {-# LANGUAGE ScopedTypeVariables #-}
> 
> import Control.Monad
> import Control.Concurrent
> import Control.Concurrent.STM
> import Control.Concurrent.STM.Stats
> import System.IO.Unsafe
> 
> -- Copied from GHC docs
> 
> children :: MVar [MVar ()]
> children = unsafePerformIO (newMVar [])
> 
> waitForChildren :: IO ()
> waitForChildren = do
>    cs <- takeMVar children
>    case cs of
>       []   -> return ()
>       m:ms -> do
>          putMVar children ms
>          takeMVar m
>          waitForChildren
> 
> forkChild :: IO () -> IO ThreadId
> forkChild io = do
>     mvar <- newEmptyMVar
>     childs <- takeMVar children
>     putMVar children (mvar:childs)
>     forkFinally io (\_ -> putMVar mvar ())
> 
> -- Test case
> 
> main = do
>    forkChild $ newTVarIO 0 >>= incrManyTimes "thread1"
>    forkChild $ newTVarIO 0 >>= incrManyTimes "thread2"
>    forkChild $ newTVarIO 0 >>= incrManyTimes "thread3"
>    forkChild $ newTVarIO 0 >>= incrManyTimes "thread4"
>    waitForChildren
>    dumpSTMStats -- Confirms no conflicts
> 
> incrManyTimes :: String -> TVar Int -> IO ()
> incrManyTimes l = incrRec (1000000 :: Int) where
>    incrRec n v | n == 0     = pure ()
>                | otherwise = trackNamedSTM l (modifyTVar v (+1)) >>
> incrRec (n-1) v
> 

While your code does not contain any conflicts, the stm-stats library
seems to contain some. Replacing trackNamedSTM with atomically speeds
things up quite a bit, so most of the time is lost in the library. A
short look at the code of the library shows that it uses
atomicModifyIORef on some global IORef holding a map of the statistics.
And your stm transactions are really short, thus that IORef is under
high pressure. Now atomicModifyIORef works by reading the old value,
computing the new one and then doing an atomic swap, if the IORef still
contains the old value. So if the value changed, the new value has to be
read, computed, and so on... Now you have four threads doing tiny
transactions and then updating this IORef, interfering with each other.

If I make the transaction longer by doing 1000 increments in one
transaction and then doing only 1000 transactions, the code scales like
one would expect.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20160218/9518bc35/attachment.sig>