[Haskell-cafe] A problem with par and modules boundaries...

Fri May 22 10:34:18 EDT 2009

Am Freitag 22 Mai 2009 15:11:32 schrieb Mario Blazevic:
> Daniel Fischer wrote:
> > Am Freitag 22 Mai 2009 04:59:51 schrieb Mario Blažević:
> >> ...
> >> I'm confused. I know that `par` must be able work across modules
> >> boundaries, because Control.Parallel.Strategies is a module and
> >> presumably it works. What am I doing wrong?
> >
> > You forgot
> >
> > {-# INLINE parallelize #-}
> >
> > For me, that works.
>
> 	That's great, thank you. I am still baffled, though. Must every
> exported function that uses `par' be INLINEd? Does every exported caller
> of such a function need the same treatment? Is `par' really a macro,
> rather than a function?

I'm not an expert in either parallelism/concurrency or GHC, so my interpretation may be 
wrong.

The functions par and pseq are defined in GHC.Conc:
============================================================
--      Nota Bene: 'pseq' used to be 'seq'
--                 but 'seq' is now defined in PrelGHC
--
-- "pseq" is defined a bit weirdly (see below)
--
-- The reason for the strange "lazy" call is that
-- it fools the compiler into thinking that pseq  and par are non-strict in
-- their second argument (even if it inlines pseq at the call site).
-- If it thinks pseq is strict in "y", then it often evaluates
-- "y" before "x", which is totally wrong.

{-# INLINE pseq  #-}
pseq :: a -> b -> b
pseq  x y = x `seq` lazy y

{-# INLINE par  #-}
par :: a -> b -> b
par  x y = case (par# x) of { _ -> lazy y }
============================================================

As far as I understand, par doesn't guarantee that both arguments are evaluated in 
parallel, it's just a suggestion to the compiler, and if whatever heuristics the compiler 
uses say it may be favourable to do it in parallel, it will produce code to calculate it 
in parallel (given appropriate compile- and run-time flags), otherwise it produces purely 
sequential code.

With parallelize in a separate module, when compiling that, the compiler has no way to see 
whether parallelizing the computation may be beneficial, so doesn't produce (potentially) 
parallel code. At the use site, in the other module, it doesn't see the 'par', so has no 
reason to even consider producing parallel code.

If parallelize is defined in the module where it's used, it will be inlined anyway since 
it is small, so the compiler sees the 'par' (actually par#) when compiling the use site 
and can employ the heuristics to decide whether to produce parallel code.

If you place an INLINE pragma near the definition of parallelize, it will be inlined when 
compiling the importing module, so again the compiler sees the opportunity to parallelize.

So, if I got it right (or nearly right), yes, every exported function that uses par should 
be INLINEd [1], and have a simple enough body that it will indeed be inlined.
The same holds for callers of such functions, if the compiler can't see at the definition 
that parallelism is good, let the function be inlined so that it may be spotted at the 
call site.

[1] Well, I suppose for
function x = (expensive1 `par` expensive2) `seq` x
and such, if expensive1/2 are defined in the same module, it may not be necessary.