[Haskell-cafe] multithreading speedup

Fri Apr 13 19:41:46 EDT 2007

On Sat, Apr 14, 2007 at 01:31:58AM +0200, Fawzi Mohamed wrote:
> 
> Il giorno Apr 14, 2007, alle ore 12:33 AM, Stefan O'Rear ha scritto:
> 
> >On Sat, Apr 14, 2007 at 12:27:10AM +0200, Fawzi Mohamed wrote:
> >>I was trying to speed up a program that I wrote and so I thought
> >>about using multiple threads.
> >>I  have a quite easy parallel program and I did the following
> >>
> >>do
> >>      subRes <- MVar.newMVar []
> >>      putStrLn "starting threads"
> >>      subV <- flip mapM [0 .. (nThreads - 1)] $
> >>              ( \i -> do
> >>                  subR <- MVar.newEmptyMVar
> >>                  let writeRes r = do { MVar.putMVar subR r }
> >>                  forkOS (do

Getting rid of that forkOS will help a LOT - forkOS is very expensive
and gives no benefit.  It exists only for the benefit of FFI libs like
OpenGL that use thread local state. 

Plain forkIO uses a thread pool in the runtime; it is even more
parallel than forkOS, and extremely cheap. 

The fact that you are using forkOS is a dead giveaway of a
documentation bug somewhere.  Please report it, so nobody else makes
this mistake! 

> >>                           let r=eval (startData !! i)
> >>                           writeRes $! r
> >>                           putStr $ "writtenRes")
> >>                  return subR
> >>              )
> >>      putStrLn "started threads"
> >>      toFold <- mapM MVar.takeMVar subV
> >>      putStrLn "about to fold"
> >>      return $ foldl1 mergeRes toFold
> >>
> >>I know that the threads really calculate what I want, and as soon as
> >>they are finished I get the result.
> >>The problem is that I have no speed up, the time is basically the sum
> >>of the time for the two threads.
> >>I thought that ghc now would take advantage of the two cpus if I
> >>compiled with -threaded.
> >>Am I wrong, do I need some special flag, a newer version of the
> >>compiler (I have 6.6.20070129), or it is just normal?
> >
> >./MyProgram +RTS -N2
> >
> >where N is your CPU count.
> 
> thanks, that was it.
> 
> >(that said, DO NOT USE THREADS IF AT ALL POSSIBLE, they are ugly and
> >cause heisenbugs, if you want paralelism `par` from Control.Parallel
> >is to be preferred if at all possible since it is deterministic)
> 
> in theory yes, but I am quite used to programm with threads and even  
> mpi.
> I have looked at Control.Parallel (and the nice article on it), but  
> there is no easy way tell (at least that I saw) to leave the whole  
> calculation to a sub thread.
> Actually my code should be equivalent to parMap rwhnf, but I get the  
> following results:
> 
> parMap
> 3.63user 0.02system 0:01.97elapsed 185%CPU (0avgtext+0avgdata  
> 0maxresident)k
> 0inputs+0outputs (0major+1039minor)pagefaults 0swaps
> 
> threads
> 3.14user 0.02system 0:01.68elapsed 187%CPU (0avgtext+0avgdata  
> 0maxresident)k
> 0inputs+0outputs (0major+1041minor)pagefaults 0swaps

Those look equally parallel to me :)

> I suppose that it is because I have a thread for each element in the  
> list plus a main thread vs just one thread per element in the list,  
> but I am not sure, if someone has some ideas...
> 
> With threads (now the I managed to have some speedup) I can use a  
> workers/queue approach and have a better load balancing.

Threads are supposed to be cheap.  If implementing a thread pool by
hand ever gives a speedup, it's a bug. 

> I had looked at strategies, but I need first a breath first traversal  
> of a graph generated on the fly (to be parallel) and then a depth  
> first traversal (to avoid space leaks), and I found no easy way to do  
> it with strategies, so I did it by hand.
> 
> by the way is there a way to know how many processors are available  
> to the program (to make the strategy or thread control depend on it)?

Stefan