[Haskell-cafe] Odd parallel haskell observations
Alexander Kotelnikov
sacha at myxomop.com
Sat Aug 7 16:47:45 EDT 2010
Hello.
I am exploring haskell features for parallel and cocurrent programming
and see something difficult to explain.
In brief - asking RTS to use more threads results in awfull drop of
performance. And according to 'top' test programm consumes up to N CPUs
power.
Am I doing something wrong? I attached the code, but I am just issuing
thousands of HTTP GET requests in 1-4 forkIO threads. And since it looks
like local apache is faster than haskell program (which is a pity) I
expected that using more OS threads should improve performance.
Just in case:
ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.12.1
-------------- next part --------------
import Data.List
import System.IO
import qualified System.IO.UTF8
import System.Environment (getArgs)
import Network.HTTP
import Network.URI
import System.Time
import System.IO.Unsafe
import Control.Monad
import Control.Exception
import Control.Concurrent
import Control.Concurrent.MVar
secDiff :: ClockTime -> ClockTime -> Float
secDiff (TOD secs1 psecs1) (TOD secs2 psecs2) =
fromInteger (psecs2 - psecs1) / 1e12 + fromInteger (secs2 - secs1)
-- single get
get :: Int -> IO(String)
get id = do
res <- simpleHTTP $ getRequest "http://127.0.0.1"
case res of
Left err -> return(show err)
Right rsp -> return(show $ rspCode rsp)
-- perform GET per each list element using c threads
doList :: [Int] -> Int -> IO()
doList ids 0 =
return()
doList [] c =
return()
doList ids c = do
forkChild $ forM_ todo get
doList later (c-1)
where (todo, later) = splitAt (length ids `div` c) ids
{-
Copied from
http://haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/Control-Concurrent.html#11
Terminating the program
-}
children :: MVar [MVar ()]
children = unsafePerformIO (newMVar [])
waitForChildren :: IO ()
waitForChildren = do
cs <- takeMVar children
case cs of
[] -> return ()
m:ms -> do
putMVar children ms
takeMVar m
waitForChildren
forkChild :: IO () -> IO ThreadId
forkChild io = do
mvar <- newEmptyMVar
childs <- takeMVar children
putMVar children (mvar:childs)
forkIO (io `finally` putMVar mvar ())
-- end of copied code
main = do
[c', n'] <- getArgs
let
c = read c' :: Int
n = read n' :: Int
start <- getClockTime
doList [1..n] c
waitForChildren
end <- getClockTime
putStrLn $ show(c) ++ " " ++ show(secDiff start end) ++ "s"
-------------- next part --------------
20:31 sacha at loft4633:/tmp 21> ghc --make -threaded get.hs
[1 of 1] Compiling Main ( get.hs, get.o )
Linking get ...
20:31 sacha at loft4633:/tmp 22> ./get 1 10000
1 3.242352s
20:31 sacha at loft4633:/tmp 23> ./get 2 10000
2 3.08306s
20:31 sacha at loft4633:/tmp 24> ./get 2 10000 +RTS -N2
2 6.898871s
20:32 sacha at loft4633:/tmp 25> ./get 3 10000
3 2.950677s
20:32 sacha at loft4633:/tmp 26> ./get 3 10000 +RTS -N2
3 7.381678s
20:32 sacha at loft4633:/tmp 27> ./get 3 10000 +RTS -N3
3 14.683548s
20:32 sacha at loft4633:/tmp 28> ./get 4 10000
4 3.332165s
20:33 sacha at loft4633:/tmp 29> ./get 4 10000 +RTS -N4 -s
./get 4 10000 +RTS -N4 -s
4 57.17923s
2,147,969,912 bytes allocated in the heap
49,059,288 bytes copied during GC
736,656 bytes maximum residency (98 sample(s))
486,744 bytes maximum slop
5 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 949 collections, 948 parallel, 76.73s, 25.67s elapsed
Generation 1: 98 collections, 98 parallel, 7.70s, 2.56s elapsed
Parallel GC work balance: 2.17 (6115428 / 2822692, ideal 4)
MUT time (elapsed) GC time (elapsed)
Task 0 (worker) : 1.43s ( 27.76s) 6.31s ( 2.12s)
Task 1 (worker) : 0.00s ( 28.13s) 10.62s ( 3.56s)
Task 2 (worker) : 0.37s ( 28.63s) 11.06s ( 3.69s)
Task 3 (worker) : 0.00s ( 28.95s) 6.29s ( 2.10s)
Task 4 (worker) : 20.73s ( 28.95s) 9.68s ( 3.24s)
Task 5 (worker) : 0.00s ( 28.95s) 0.60s ( 0.20s)
Task 6 (worker) : 21.81s ( 28.95s) 11.91s ( 3.97s)
Task 7 (worker) : 18.59s ( 28.95s) 13.04s ( 4.36s)
Task 8 (worker) : 17.24s ( 28.96s) 14.92s ( 4.99s)
SPARKS: 0 (0 converted, 0 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 79.23s ( 28.95s elapsed)
GC time 84.43s ( 28.23s elapsed)
EXIT time 0.00s ( 0.01s elapsed)
Total time 162.49s ( 57.19s elapsed)
%GC time 52.0% (49.4% elapsed)
Alloc rate 27,513,782 bytes per MUT second
Productivity 48.0% of total user, 136.5% of total elapsed
gc_alloc_block_sync: 15006
whitehole_spin: 0
gen[0].steps[0].sync_large_objects: 7617
gen[0].steps[1].sync_large_objects: 35
gen[1].steps[0].sync_large_objects: 1400
More information about the Haskell-Cafe
mailing list