[Long, probably not-beginners anymore] Parallel folds and folds as arrows (was: Re: [Haskell-beginners] Re: When, if ever, does Haskell "calculate once"?)

Maciej Piechotka uzytkownik2 at gmail.com
Thu May 6 21:15:19 EDT 2010


On Thu, 2010-05-06 at 23:46 +0200, Daniel Fischer wrote: 
> Share.share :: GHC.Types.Int
> GblId
> [Str: DmdType]
> Share.share =
>   case GHC.List.$wlen @ GHC.Integer.Type.Integer Share.share_a 0
>   of ww_amc { __DEFAULT ->
>   GHC.Types.I# (GHC.Prim.+# ww_amc ww_amc)
>   }
> 

Hmm. What's the name of this form and how to get it?

> 
> No.
> 
> cFoldl' f g (b0,c0) xs0 = lgo b0 c0 xs0
>     where
>       lgo b c [] = (b,c)
>       lgo !b !c (x:xs) = lgo (f b x) (g c x) xs
> 

Ok. Fixed (I tried fast rewrite from foldr')

> 
> 64-bit system? I get
> 

64 bit, GHC 6.12.2.
% ghc -V
The Glorious Glasgow Haskell Compilation System, version 6.12.2
% file a.out 
a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically
linked (uses shared libs), for GNU/Linux 2.6.9, not stripped

> 
> And that is strange, because I get the same figures for that one as for the 
> first (times differ by a few hundredths of a second).

Fixed or non-fixed version?

> Is that a difference between 32-bit code generator and 64-bit or between 
> GHC versions (6.12.2 here, but 6.10.3 gives roughly the same results)?
> 

Hmm. Compiler and platform matches. Unless you use some other 64-bit
platform - not x86-64 ;)

> 
> > > main =
> > >   print $! uncurry (+) (cFoldl' lengthFold lengthFold
> > >                                 (0, 0) [1..size])
> 
> And that gives the same figures as the other two (plus/minus 0.05s).
> 
> >
> > All are compiled with optimizations.
> 
> All compiled with -O2.
> 

Hmm. Difference between -O1 and -O2

Fixed versions and with sum (and -O2):

> main = let a = [1..size]
>            l = length a + sum a
>        in print $! l

Lot's of memory(Over 3 GiB). I voluntarily killed process

> main = print $! length [1..size] + sum [1..size]

Lot's of memory(Over 3 GiB). I voluntarily killed process.

So far as being inplace.

> main = print $! uncurry (+) (cFoldl' lengthFold (+) (0, 0) [1..size])

5000000150000000
  16,889,753,976 bytes allocated in the heap
       3,356,480 bytes copied during GC
           1,976 bytes maximum residency (1 sample(s))
          28,200 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0: 32216 collections,     0 parallel,  0.29s,  0.38s
elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s
elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   14.89s  ( 15.12s elapsed)
  GC    time    0.29s  (  0.38s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   15.18s  ( 15.50s elapsed)

  %GC time       1.9%  (2.4% elapsed)

  Alloc rate    1,134,094,110 bytes per MUT second

  Productivity  98.1% of total user, 96.1% of total elapsed

./a.out +RTS -s  15.18s user 0.11s system 98% cpu 15.503 total

Lowered to size = 100000 (bigger causes stack overflow in first main):

> main = let a = [1..size]
>            l = length a + sum a
>        in print $! l

5000150000
      22,045,352 bytes allocated in the heap
      18,781,768 bytes copied during GC
       6,316,904 bytes maximum residency (4 sample(s))
       3,141,912 bytes maximum slop
              17 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:    23 collections,     0 parallel,  0.05s,  0.05s
elapsed
  Generation 1:     4 collections,     0 parallel,  0.03s,  0.04s
elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.02s  (  0.03s elapsed)
  GC    time    0.08s  (  0.09s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    0.10s  (  0.12s elapsed)

  %GC time      78.4%  (75.8% elapsed)

  Alloc rate    1,002,334,818 bytes per MUT second

  Productivity  20.6% of total user, 17.2% of total elapsed

./a.out +RTS -s  0.10s user 0.02s system 96% cpu 0.129 total

> main = print $! length [1..size] + sum [1..size]

5000150000
      30,077,024 bytes allocated in the heap
      17,482,888 bytes copied during GC
       5,602,600 bytes maximum residency (4 sample(s))
       3,144,312 bytes maximum slop
              15 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:    38 collections,     0 parallel,  0.04s,  0.05s
elapsed
  Generation 1:     4 collections,     0 parallel,  0.03s,  0.03s
elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.02s  (  0.03s elapsed)
  GC    time    0.07s  (  0.08s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    0.09s  (  0.11s elapsed)

  %GC time      72.0%  (70.0% elapsed)

  Alloc rate    1,156,986,613 bytes per MUT second

  Productivity  26.9% of total user, 22.9% of total elapsed

./a.out +RTS -s  0.09s user 0.02s system 97% cpu 0.116 total

> main = print $! uncurry (+) (cFoldl' lengthFold (+) (0, 0) [1..size])

5000150000
      17,128,128 bytes allocated in the heap
          10,608 bytes copied during GC
           2,072 bytes maximum residency (1 sample(s))
          28,024 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:    32 collections,     0 parallel,  0.00s,  0.00s
elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s
elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.02s  (  0.02s elapsed)
  GC    time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    0.02s  (  0.02s elapsed)

  %GC time       5.3%  (3.9% elapsed)

  Alloc rate    951,721,286 bytes per MUT second

  Productivity  94.7% of total user, 90.2% of total elapsed

./a.out +RTS -s  0.02s user 0.00s system 96% cpu 0.024 total

It seems that the best, at least for large inputs.


-----------------------------------------------------------------------

On the other hand it seems to form an arrow[1].
First the result of test:
5000000150000000
  12,800,063,440 bytes allocated in the heap
       2,545,048 bytes copied during GC
           1,968 bytes maximum residency (1 sample(s))
          28,216 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0: 24414 collections,     0 parallel,  0.24s,  0.29s
elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s
elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    7.18s  (  7.35s elapsed)
  GC    time    0.24s  (  0.29s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    7.42s  (  7.64s elapsed)

  %GC time       3.3%  (3.8% elapsed)

  Alloc rate    1,782,759,994 bytes per MUT second

  Productivity  96.7% of total user, 93.9% of total elapsed

./a.out +RTS -s  7.42s user 0.09s system 98% cpu 7.648 total

(Yes - lower then the code above - result reproducable).

Code:

> {-# LANGUAGE BangPatterns #-}
> import Control.Arrow
> import Control.Category
> import Data.List
> import Prelude hiding (id, (.))
> 
> newtype FoldlArrow a b c = FoldlArrow (a -> b -> c)
> 
> instance Category (FoldlArrow a) where
>     id = FoldlArrow $ \a !b -> b
>     (FoldlArrow f)  . (FoldlArrow g) =
>         FoldlArrow $ \a !b -> let !c = g a b
>                               in f a c
> 
> instance Arrow (FoldlArrow a) where
>     arr f = FoldlArrow $ const f
>     first (FoldlArrow f) = FoldlArrow $ \a (!b, !c) -> (f a b, c)
>     second (FoldlArrow f) = FoldlArrow $ \a (!b, !c) -> (b, f a c)
>     (FoldlArrow f) *** (FoldlArrow g) =
>         FoldlArrow $ \a (!b, !c) -> (f a b, g a c)
>     (FoldlArrow f) &&& (FoldlArrow g) =
>         FoldlArrow $ \a !b -> (f a b, g a b)
> 
> instance ArrowChoice (FoldlArrow a) where
>     left (FoldlArrow f) = FoldlArrow left'
>                           where left' a (Left !l) = (Left $! f a l)
>                                 left' a (Right !r) = (Right r)
>     right (FoldlArrow f) = FoldlArrow right'
>                            where right' a (Left !l) = (Left l)
>                                  right' a (Right !r) =
>                                      (Right $! f a r)
>     (FoldlArrow f) +++ (FoldlArrow g) =
>         FoldlArrow choice
>         where choice a (Left !l) = (Left $! f a l)
>               choice a (Right !r) = (Right $! g a r)
>     (FoldlArrow f) ||| (FoldlArrow g) =
>         FoldlArrow choice
>         where choice a (Left !l) = f a l
>               choice a (Right !r) = g a r
> 
> instance ArrowApply (FoldlArrow a) where
>     app = FoldlArrow $ \a (FoldlArrow !f, !b) -> f a b
> 
> doFoldl :: FoldlArrow a b b -> b -> [a] -> b
> doFoldl (FoldlArrow f) = foldl' (flip f) 
> 
> element :: FoldlArrow a b a
> element = FoldlArrow const
> 
> lengthA :: FoldlArrow a Int Int
> lengthA = arr (+1)
> 
> sumA :: Num a => FoldlArrow a a a
> sumA = arr (uncurry (+)) . (element &&& id)
> 
> size = 100000000
> 
> main =
>     print $! uncurry (+) $ doFoldl (lengthA *** sumA) (0, 0) [1..size]

On the other hand if I use arrow syntax:

> myArr :: Num a => FoldlArrow a (a, a) (a, a)
> myArr = proc (l, s) -> do e <- element -< ()
>                           returnA -< (l + 1, s + e)

It starts consuming memory as well. Somewhere is lazy passing of value
but I cannot find where

Regards

PS. As it is probably out of scope and topic of beginners mailing list
I'm CC'ing cafe (possibly beginners should be dropped).

[1] It can be extended to work on other arrows as well - not only (->).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
Url : http://www.haskell.org/pipermail/beginners/attachments/20100506/55251b93/attachment-0001.bin


More information about the Beginners mailing list