[Haskell-cafe] 1st attempt at parallelizing

Felipe Lessa felipe.lessa at gmail.com
Mon Jul 26 18:25:09 EDT 2010


2010/7/26 Günther Schmidt <gue.schmidt at web.de>:
> Hi all,

Hello!

> I'm spidering web pages, the implementation currently is synchronous. I'd
> like to parallelize this for speed-up, ie. get up to 6 pages in parallel and
> recycle those threads.

This is usually called concurrent programming, not parallel.

> Now I have come across good examples for this on the web before, but I doubt
> I'd find it again right away.
>
> I'd appreciate some good pointers.

There's a simple way of doing this with Chans, for example:

import Control.Applicative
import Control.Concurrent.STM
import Control.Monad
import qualified Data.Map as M

data Page = ...
data Info = ...

download :: Page -> IO Info
download = ...

getOneByOne :: [Page] -> IO (M.Map Page Info)
getOneByOne = M.fromList <$> mapM (\p -> (,) p <$> download p)

downloader :: TChan (Maybe Page) -> TChan (Page, Info) -> IO ()
downloader in out = do
  mp <- atomically (readTChan in)
  case mp of
    Nothing -> return ()
    Just p -> download p >>= atomically . writeTChan out

getConcurrent :: Int -> [Page] -> IO [M.Map Page Info]
getConcurrent n xs = do
  in <- newTChanIO
  out <- newTChanIO
  replicateM_ n (forkIO $ downloader in out) -- create n threads
  mapM (writeTChan in . Just) xs
  replicateM_ n (writeTChan in Nothing) -- kill n threads
  M.fromList <$> mapM (\_ -> readTChan out) xs


This code doesn't take exceptions into account, which you should, but
this works.  Well, I guess, didn't try, if it compiles then it should
;).

HTH,

-- 
Felipe.


More information about the Haskell-Cafe mailing list