[Haskell-cafe] multithreading speedup
Fawzi Mohamed
fmohamed at mac.com
Fri Apr 13 19:31:58 EDT 2007
Il giorno Apr 14, 2007, alle ore 12:33 AM, Stefan O'Rear ha scritto:
> On Sat, Apr 14, 2007 at 12:27:10AM +0200, Fawzi Mohamed wrote:
>> I was trying to speed up a program that I wrote and so I thought
>> about using multiple threads.
>> I have a quite easy parallel program and I did the following
>>
>> do
>> subRes <- MVar.newMVar []
>> putStrLn "starting threads"
>> subV <- flip mapM [0 .. (nThreads - 1)] $
>> ( \i -> do
>> subR <- MVar.newEmptyMVar
>> let writeRes r = do { MVar.putMVar subR r }
>> forkOS (do
>> let r=eval (startData !! i)
>> writeRes $! r
>> putStr $ "writtenRes")
>> return subR
>> )
>> putStrLn "started threads"
>> toFold <- mapM MVar.takeMVar subV
>> putStrLn "about to fold"
>> return $ foldl1 mergeRes toFold
>>
>> I know that the threads really calculate what I want, and as soon as
>> they are finished I get the result.
>> The problem is that I have no speed up, the time is basically the sum
>> of the time for the two threads.
>> I thought that ghc now would take advantage of the two cpus if I
>> compiled with -threaded.
>> Am I wrong, do I need some special flag, a newer version of the
>> compiler (I have 6.6.20070129), or it is just normal?
>
> ./MyProgram +RTS -N2
>
> where N is your CPU count.
thanks, that was it.
> (that said, DO NOT USE THREADS IF AT ALL POSSIBLE, they are ugly and
> cause heisenbugs, if you want paralelism `par` from Control.Parallel
> is to be preferred if at all possible since it is deterministic)
in theory yes, but I am quite used to programm with threads and even
mpi.
I have looked at Control.Parallel (and the nice article on it), but
there is no easy way tell (at least that I saw) to leave the whole
calculation to a sub thread.
Actually my code should be equivalent to parMap rwhnf, but I get the
following results:
parMap
3.63user 0.02system 0:01.97elapsed 185%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+1039minor)pagefaults 0swaps
threads
3.14user 0.02system 0:01.68elapsed 187%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+1041minor)pagefaults 0swaps
I suppose that it is because I have a thread for each element in the
list plus a main thread vs just one thread per element in the list,
but I am not sure, if someone has some ideas...
With threads (now the I managed to have some speedup) I can use a
workers/queue approach and have a better load balancing.
I had looked at strategies, but I need first a breath first traversal
of a graph generated on the fly (to be parallel) and then a depth
first traversal (to avoid space leaks), and I found no easy way to do
it with strategies, so I did it by hand.
by the way is there a way to know how many processors are available
to the program (to make the strategy or thread control depend on it)?
Fawzi
More information about the Haskell-Cafe
mailing list