[Haskell-cafe] A problem with par and modules boundaries...

Sat May 23 06:40:08 EDT 2009

On Fri, 2009-05-22 at 05:30 -0700, Don Stewart wrote:
> Answer recorded at:
> 
>     http://haskell.org/haskellwiki/Performance/Parallel

I have to complain, this answer doesn't explain anything. This isn't
like straight-line performance, there's no reason as far as I can see
that inlining should change the operational behaviour of parallel
evaluation, unless there's some mistake in the original such as
accidentally relying on an unspecified evaluation order.

Now, I tried the example using two versions of ghc and I get different
behaviour from what other people are seeing. With the original code, (ie
parallelize function in the same module) with ghc-6.10.1 I get no
speedup at all from -N2 and with 6.11 I get a very good speedup (though
single threaded performance is slightly lower in 6.11)

Original code
  ghc-6.10.1,	-N1		-N2
  real		0m9.435s	0m9.328s
  user		0m9.369s	0m9.249s

  ghc-6.11,	-N1		-N2
  real		0m10.262s	0m6.117s
  user		0m10.161s	0m11.093s

With the parallelize function moved into another module I get no change
whatsoever. Indeed even when I force it *not* to be inlined with {-#
NOINLINE parallelize #-} then I still get no change in behaviour (as
indeed I expected).

So I view this advice to force inlining with great suspicion (at worst
it encourages people not to think and to look at it as magic). That
said, why it does not get any speedup with ghc-6.10 is also a mystery to
me (there's very little GC going on).

Don: can we change the advice on the wiki please? It currently makes it
look like a known and understood issue. If anything we should suggest
using a later ghc version.

Duncan