[Haskell-cafe] A problem with par and modules boundaries...

Sat May 23 08:35:47 EDT 2009

Am Samstag 23 Mai 2009 13:06:04 schrieb Duncan Coutts:
> On Fri, 2009-05-22 at 16:34 +0200, Daniel Fischer wrote:
> > > 	That's great, thank you. I am still baffled, though.
>
> I'm baffled too! I don't see the same behaviour at all (see the other
> email).
>
> > > Must every exported function that uses `par' be INLINEd? Does every
> > > exported caller of such a function need the same treatment?
>
> It really should not be necessary.
>
> > > Is `par' really a macro, rather than a function?
>
> It's a function.
>
> > As far as I understand, par doesn't guarantee that both arguments are
> > evaluated in parallel, it's just a suggestion to the compiler, and if
> > whatever heuristics the compiler uses say it may be favourable to do
> > it in parallel, it will produce code to calculate it in parallel
> > (given appropriate compile- and run-time flags), otherwise it produces
> > purely sequential code.
> >
> > With parallelize in a separate module, when compiling that, the
> > compiler has no way to see whether parallelizing the computation may
> > be beneficial, so doesn't produce (potentially) parallel code. At the
> > use site, in the other module, it doesn't see the 'par', so has no
> > reason to even consider producing parallel code.
>
> I don't think this is right. As I understand it, par always creates a
> spark. It has nothing to do with heuristics.

Quite possible.
I was only guessing from the fact that sometimes par evaluates things in parallel and 
sometimes not, plus when thinking what might cause the described behaviour, cross-module 
inlining came to mind, I tried adding an INLINE pragma and it worked - or so it seemed. 
Then I threw together an explanation of the observed behaviour. That explanation must be 
wrong, though, see below.

>
> Whether the spark actually gets evaluated in parallel depends on the
> runtime system and whether the spark "fizzles" before it gets a chance
> to run. Of course when using the single threaded rts then the sparks are
> never evaluated in parallel. With the threaded rts and given enough
> CPUs, the rts will try to schedule the sparks onto idle CPUs. This
> business of getting sparks running on other CPUs has improved
> significantly since ghc-6.10. The current development version uses a
> better concurrent queue data structure to manage the spark pool. That's
> probably the underlying reason for why the example works well in
> ghc-6.11 but works badly in 6.10. I'm afraid I'm not sure of what
> exactly is going wrong that means it doesn't work well in 6.10.

I have tried with 6.10.3 and 6.10.1,  with parallelize in the same module and in a 
separate module
- with no pragma
- with an INLINE pragma
- with a NOINLINE pragma

6.10.1 did not parallelize in any of these settings
6.10.3 parallelized in all these settings except "separate module, no pragma".

Then I tried a few other settigns with 6.10.3, got parallel evaluation if there's an 
INLINE or a NOINLINE pragma on parallelize, or the module header of Main is 
module Main (main) where,
not if Main exports all top level definitions and parallelize is neither INLINEd nor 
NOINLINEd.

Weird.

>
> Generally I'd expect the effect of par to be pretty insensitive to
> inlining. I'm cc'ing the ghc users list so perhaps we'll get some expert
> commentary.

That would be good.

>
> Duncan
>

Daniel