Having trouble with parallel Haskell
marlowsd at gmail.com
Thu Jun 5 05:44:15 EDT 2008
Bryan O'Sullivan wrote:
> I intended to write a half chapter or so about parallel programming in
> Haskell for the book I'm working on, but I've been coming to the
> conclusion that the time is not yet ripe for this. I hope it will be
> helpful if I share my experiences here.
> Specifically, I was planning to write about parallel programming in pure
> code: use of "par" and programming with strategies.
> The most substantial problem is that the threaded RTS in GHC 6.8.2 is
> very crashy in the face of "par": about 90% of my runs fail with a
> segfault or an assertion failure. Simon Marlow mentioned that this bug
> is fixed, but I've been unsuccessful in building a GHC 6.8.3 release
> candidate snapshot so far, so I can't verify this.
> When a run does go through, I have found it surprisingly difficult to
> actually get both cores of a dual-core system to show activity. As
> there are no tools I can use to see what is happening, I am stumped. It
> is of course quite likely that I am doing something wrong that results
> in unexpected dependencies, but I cannot find a means to gaining insight
> into the problem.
> As an example, I have a simple parallel quicksort that is very
> conservative in its sparking, and I get no joy out of it on a dual-core
> machine: it mostly sticks to a single core. This wouldn't be a problem
> if I had some way to spot the presumed data dependency that is
> serialising the code, but no joy.
I don't have a great deal of time to look at this right now, but I had a
bit more luck with this version:
parSort :: (NFData a, Ord a) => Int -> [a] -> [a]
parSort d list@(x:xs)
| d <= 0 = sort list
| otherwise = rnf below `pseq` (
rnf above `pseq` (
rnf lesser `par` (
rnf greater `pseq` (
(lesser ++ x:greater)))))
where below = [y | y <- xs, y < x]
above = [y | y <- xs, y >= x]
lesser = parSort d' below
greater = parSort d' above
d' = d - 1
parSort _ _ = 
I imagine the problem is mainly to do with nested pars: in the original
code, the left-hand par evaluates immediately into another par, and so on:
there's no actual *work* being done in each spark. Which is why I tried to
do some evaluation before each par above. It still doesn't achieve much
Incedentally, this kind of recursive parallelism is typically much harder
to get any benefit from than, for example, a simple top-level parMap. If
you're just trying to introduce parallelism, it might be worthwhile
sticking to the easy cases: simple strategies, and speeding up concurrent
Also you might notice that the GC is doing a lot of work in this example
(first port of call is always +RTS -sstderr). I tried the parallel GC, but
it didn't help at all, I guess because the heap is full of lists and the
parallel GC likes trees.
The biggest problem, as you note, is the lack of tools for investigating
parallel performance. This is high on our priority list, we have a couple
of projects planned over the summer to improve the situation.
More information about the Glasgow-haskell-users