[Haskell-cafe] Control.Concurrent.forkIO versus
Control.Parallel.par
Sterling Clover
s.clover at gmail.com
Mon Jul 28 01:20:50 EDT 2008
I think a better way to look at it is that Haskell has two separate
mechanisms for different *notions* of concurrency -- forkIO for
actual concurrent computation which needs explicit threads and
communication (and within that, either semaphore-based communication
with MVars or transactional control with TVars and STM), and par for
parallelism which is to express computations that are innately
parallel. See, e.g. the GHC users manual which defines them as such:
* Parallelism means running a Haskell program on multiple processors,
with the goal of improving performance. Ideally, this should be done
invisibly, and with no semantic changes.
* Concurrency means implementing a program by using multiple I/O-
performing threads. While a concurrent Haskell program can run on a
parallel machine, the primary goal of using concurrency is not to
gain performance, but rather because that is the simplest and most
direct way to write the program. Since the threads perform I/O, the
semantics of the program is necessarily non-deterministic.
(http://www.haskell.org/ghc/docs/latest/html/users_guide/lang-
parallel.html)
In any case, I suspect that your second parallelize function doesn't
work right because \x -> x >>= return is an effective no-op, modulo
strictness characteristics of >>=. And in any case, it can't be
evaluated until it is called in a particular monadic "environment"
which is provided, sequencing and all, via liftM2. One can't
parallelize in an arbitrary monad in any case, at least without
making a number of decisions. E.g., what's the resultant state after
two parallel computations are run in a state monad?
So if you're using concurrency with a monad transformer, you probably
might want to start by stripping back the layers of the concurrent
part of your algorithm to the minimum possible, and then explicitly
managing passing state into the various forked computations, which
can then be wrapped in as many runReaderT or such calls as necessary.
On another, general, note, unless you're very careful, mixing IO into
your algorithm will probably result in very underperformant parallel
code, since it will be IO rather than processor bound. Again the
point from the GHC manual that "the primary goal of using concurrency
is not to gain performance, but rather because that is the simplest
and most direct way to write the program" seems appropriate.
Additionally, many have found it easier at this stage to get good
performance out of writing parallel code with concurrent mechanisms
rather than `par`, because careless use of `par` will tend to add as
much overhead in spark creation as is saved with multiprocessing,
while an explicit work queue can be easier to reason about.
Regards,
S.
On Jul 27, 2008, at 10:49 PM, Mario Blažević wrote:
>
> Hello. I have a question about parallel computation in Haskell.
> After browsing the GHC library documentation, I was left with
> impression that there are two separate mechanisms for expressing
> concurrency: Control.Parallel.par for pure computations and
> Control.Concurrent.forkIO for computations in IO monad.
>
> This dichotomy becomes a problem when one tries to use
> concurrency from a monad transformer, though I'm sure that's not
> the only such situation. One cannot assume that the base monad is
> IO so forkIO cannot be used, while Control.Parallel.par won't run
> monads. My first solution was to replace the base monad class for
> the monad transformer by the following ParallelizableMonad class:
>
> ----------------------------------------------------------------------
> ------
> class Monad m => ParallelizableMonad m where
> parallelize :: m a -> m b -> m (a, b)
> parallelize ma mb = do a <- ma
> b <- mb
> return (a, b)
>
> instance ParallelizableMonad Identity where
> parallelize (Identity a) (Identity b) = Identity (a `par` (b
> `pseq` (a, b)))
>
> instance ParallelizableMonad IO where
> parallelize ma mb = do va <- newEmptyMVar
> vb <- newEmptyMVar
> forkIO (ma >>= putMVar va)
> forkIO (mb >>= putMVar vb)
> a <- takeMVar va
> b <- takeMVar vb
> return (a, b)
> ----------------------------------------------------------------------
> ------
>
> I tested this solution, and it worked for IO computations in the
> sense that they used both CPUs. The test also ran slower on two
> CPUs that on one, but that's beside the point.
>
> Then I realized that par can, in fact, be used on any monad, it
> just needs a little nudge:
>
> ----------------------------------------------------------------------
> ------
> parallelize :: m a -> m b -> m (a, b)
> parallelize ma mb = let a = ma >>= return
> b = mb >>= return
> in a `par` (b `pseq` liftM2 (,) a b)
> ----------------------------------------------------------------------
> ------
>
> However, in this version the IO monadic computations still appear
> to use only one CPU. I cannot get par to parallelize monadic
> computations. I've used the same command-line options in both
> examples: -O -threaded and +RTS -N2. What am I missing?
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list