[Haskell-cafe] multithreading speedup

Fri Apr 13 19:31:58 EDT 2007

Il giorno Apr 14, 2007, alle ore 12:33 AM, Stefan O'Rear ha scritto:

> On Sat, Apr 14, 2007 at 12:27:10AM +0200, Fawzi Mohamed wrote:
>> I was trying to speed up a program that I wrote and so I thought
>> about using multiple threads.
>> I  have a quite easy parallel program and I did the following
>>
>> do
>>       subRes <- MVar.newMVar []
>>       putStrLn "starting threads"
>>       subV <- flip mapM [0 .. (nThreads - 1)] $
>>               ( \i -> do
>>                   subR <- MVar.newEmptyMVar
>>                   let writeRes r = do { MVar.putMVar subR r }
>>                   forkOS (do
>>                            let r=eval (startData !! i)
>>                            writeRes $! r
>>                            putStr $ "writtenRes")
>>                   return subR
>>               )
>>       putStrLn "started threads"
>>       toFold <- mapM MVar.takeMVar subV
>>       putStrLn "about to fold"
>>       return $ foldl1 mergeRes toFold
>>
>> I know that the threads really calculate what I want, and as soon as
>> they are finished I get the result.
>> The problem is that I have no speed up, the time is basically the sum
>> of the time for the two threads.
>> I thought that ghc now would take advantage of the two cpus if I
>> compiled with -threaded.
>> Am I wrong, do I need some special flag, a newer version of the
>> compiler (I have 6.6.20070129), or it is just normal?
>
> ./MyProgram +RTS -N2
>
> where N is your CPU count.

thanks, that was it.

> (that said, DO NOT USE THREADS IF AT ALL POSSIBLE, they are ugly and
> cause heisenbugs, if you want paralelism `par` from Control.Parallel
> is to be preferred if at all possible since it is deterministic)

in theory yes, but I am quite used to programm with threads and even  
mpi.
I have looked at Control.Parallel (and the nice article on it), but  
there is no easy way tell (at least that I saw) to leave the whole  
calculation to a sub thread.
Actually my code should be equivalent to parMap rwhnf, but I get the  
following results:

parMap
3.63user 0.02system 0:01.97elapsed 185%CPU (0avgtext+0avgdata  
0maxresident)k
0inputs+0outputs (0major+1039minor)pagefaults 0swaps

threads
3.14user 0.02system 0:01.68elapsed 187%CPU (0avgtext+0avgdata  
0maxresident)k
0inputs+0outputs (0major+1041minor)pagefaults 0swaps

I suppose that it is because I have a thread for each element in the  
list plus a main thread vs just one thread per element in the list,  
but I am not sure, if someone has some ideas...

With threads (now the I managed to have some speedup) I can use a  
workers/queue approach and have a better load balancing.

I had looked at strategies, but I need first a breath first traversal  
of a graph generated on the fly (to be parallel) and then a depth  
first traversal (to avoid space leaks), and I found no easy way to do  
it with strategies, so I did it by hand.

by the way is there a way to know how many processors are available  
to the program (to make the strategy or thread control depend on it)?

Fawzi