[Haskell-cafe] Problem parallelising simple algorithm

Thu Aug 20 14:51:16 UTC 2015

So, I tried to use the Par monad instead of the Eval monad (only splitting
the work into two parts, see below).
I now get (with 1 and 4 cores):

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    0.994s  (  0.998s elapsed)
  GC      time    0.048s  (  0.051s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    1.042s  (  1.050s elapsed)

------------

  TASKS: 10 (1 bound, 9 peak workers (9 total), using -N4)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    1.083s  (  0.922s elapsed)
  GC      time    0.116s  (  0.038s elapsed)
  EXIT    time    0.001s  (  0.000s elapsed)
  Total   time    1.200s  (  0.961s elapsed)

So, some slight gain, but not a lot. Admittedly, the call to fromRational
seems to take a good chunk of time.

Any comments?

Jens
---------------
import Control.Monad.Par hiding (parMap)
import Data.Ratio

pqCombine :: (Integer, Integer) -> (Integer, Integer) -> (Integer, Integer)
pqCombine (pl, ql) (pr, qr) = (pl*qr+pr, ql*qr)

pq :: (Integer, Integer) -> (Integer, Integer)
pq (a, b)
    | d >   5 = let m = (a+b+1) `div` 2
                    pql = pq (a, m)
                    pqr = pq (m, b)
                in pqCombine pql pqr
    | otherwise = (sum $ scanl1 (*) [b,b-1..a+1], product [a+1..b])
  where d = b - a

main = print . flip seq () . (\(p,q) -> fromRational (p%q)) . runPar $ do
  i <- new
  j <- new
  fork (put i (pq (0,160000)))
  fork (put j (pq (160000,320000)))
  a <- get i
  b <- get j
  return (pqCombine a b)

On Fri, 14 Aug 2015 at 10:31 Jens Blanck <jens.blanck at gmail.com> wrote:

> The following code simply computes the Euler constant by binary
> splitting. I'm struggling to get any speed-up on multi-core.
>
> Why do I get a fair number of fizzled?
> Why don't I get any real speed-up even if it claims to run tasks in
> parallel?
>
> Could I have exhausted the integer arithmetic unit(s) on my chip
> (i7-4790)?
> How would I verify that?
>
> The following is the code and typical runs on 1 and 4 cores respectively.
>
> ---------
>
> import Control.Parallel.Strategies
> import Control.DeepSeq
>
> import Data.Ratio
>
> divConq :: (NFData b) => (a -> b)
>         -> a
>         -> (a -> Bool)
>         -> (b -> b -> b)
>         -> (a -> Maybe (a,a))
>         -> b
> divConq f arg threshold conquer divide = go arg
>     where
>       go arg =
>           case divide arg of
>             Nothing -> f arg
>             Just (l0,r0) -> conquer l1 r1 `using` strat
>                 where
>                   l1 = go l0
>                   r1 = go r0
>                   strat x = do r l1; r r1; return x
>                       where r | threshold arg = rdeepseq
>                               | otherwise     = rpar
>
> pqCombine :: (Integer, Integer) -> (Integer, Integer) -> (Integer, Integer)
> pqCombine (pl, ql) (pr, qr) = (pl*qr+pr, ql*qr)
>
> pq :: (Integer, Integer) -> (Integer, Integer)
> pq (a, b)  = (\t -> (sum t, last t)) $ scanl1 (*) [b,b-1..a+1]
>
> euler :: Integer -> Rational
> euler n =
>     let (p,q) = divConq pq
>                         (0,n)
>                         (\(a,b) -> b-a < 10000)
>                         pqCombine
>                         (\(a,b) -> if b-a > 5
>                                    then let m = (a+b+1) `div` 2 in Just
> ((a,m), (m, b))
>                                    else Nothing)
>     in p%q
>
> main = print $ euler 320000 `seq` ()
>
> ----------
>
> > ./BinSplit +RTS -s -N1
> ()
>      178,375,880 bytes allocated in the heap
>        2,452,040 bytes copied during GC
>        3,222,696 bytes maximum residency (7 sample(s))
>          883,040 bytes maximum slop
>               11 MB total memory in use (2 MB lost due to fragmentation)
>
>                                      Tot time (elapsed)  Avg pause  Max
> pause
>   Gen  0       333 colls,     0 par    0.004s   0.002s     0.0000s
>  0.0000s
>   Gen  1         7 colls,     0 par    0.001s   0.001s     0.0001s
>  0.0003s
>
>   TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
>
>   SPARKS: 126 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 126 fizzled)
>
>   INIT    time    0.001s  (  0.000s elapsed)
>   MUT     time    0.928s  (  0.936s elapsed)
>   GC      time    0.005s  (  0.003s elapsed)
>   EXIT    time    0.001s  (  0.000s elapsed)
>   Total   time    0.935s  (  0.939s elapsed)
>
>   Alloc rate    192,215,387 bytes per MUT second
>
>   Productivity  99.4% of total user, 98.9% of total elapsed
>
> gc_alloc_block_sync: 0
> whitehole_spin: 0
> gen[0].sync: 0
> gen[1].sync: 0
>
>
>
> > ./BinSplit +RTS -s -N4
> ()
>      178,727,480 bytes allocated in the heap
>        3,506,488 bytes copied during GC
>        3,650,032 bytes maximum residency (7 sample(s))
>          934,976 bytes maximum slop
>               12 MB total memory in use (1 MB lost due to fragmentation)
>
>                                      Tot time (elapsed)  Avg pause  Max
> pause
>   Gen  0       141 colls,   141 par    0.009s   0.002s     0.0000s
>  0.0001s
>   Gen  1         7 colls,     6 par    0.003s   0.001s     0.0001s
>  0.0001s
>
>   Parallel GC work balance: 38.80% (serial 0%, perfect 100%)
>
>   TASKS: 10 (1 bound, 9 peak workers (9 total), using -N4)
>
>   SPARKS: 126 (12 converted, 0 overflowed, 0 dud, 0 GC'd, 114 fizzled)
>
>   INIT    time    0.002s  (  0.002s elapsed)
>   MUT     time    2.104s  (  0.946s elapsed)
>   GC      time    0.012s  (  0.003s elapsed)
>   EXIT    time    0.001s  (  0.000s elapsed)
>   Total   time    2.119s  (  0.951s elapsed)
>
>   Alloc rate    84,946,520 bytes per MUT second
>
>   Productivity  99.3% of total user, 221.3% of total elapsed
>
> gc_alloc_block_sync: 600
> whitehole_spin: 0
> gen[0].sync: 9
> gen[1].sync: 1073
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20150820/9f3acbd2/attachment.html>