[Haskell-beginners] Re: When, if ever, does Haskell "calculate once"?

Thu May 6 17:46:50 EDT 2010

On Thursday 06 May 2010 22:49:40, Maciej Piechotka wrote:
> On Thu, 2010-05-06 at 15:37 -0400, Brent Yorgey wrote:
> > On Thu, May 06, 2010 at 11:35:15AM -0700, Travis Erdman wrote:
> > > Two questions here, I think they might be related, perhaps even the
> >
> > same, but I am not sure, so I will ask both:
> > > Q1:  Re the function f below, I like the first implementation as
> >
> > it's "cleaner", but is the second implementation necessary for
> > performance purposes?
> >
> > > -- g = some CPU intensive function
> > >
> > > -- alternate 1
> > > f a b = c + (g b)
> > >     where
> > >         c = dosomethingelse a (g b)
> > >
> > > -- alternate 2
> > > f a b = c + saveit
> > >     where
> > >         saveit = g b
> > >         c = dosomethingelse a saveit
> >
> > You need alternative 2.  In general, GHC (and, I imagine, other
> > Haskell compilers) do not do much common subexpression elimination,
> > because in some cases it can be a *pessimization*.  The classic
> > example is
> >
> >   length [1..1000000] + length [1..1000000]
> >
> > vs
> >
> >   let a = [1..1000000] in length a + length a

Not a particularly fortunate example, with optimisations turned on, GHC 
shares the common subexpression (length a), so that's calculated only once:

share :: Int
share = let a = [1 .. 100000000] in length a + length a

Core:

Share.share :: GHC.Types.Int
GblId
[Str: DmdType]
Share.share =
  case GHC.List.$wlen @ GHC.Integer.Type.Integer Share.share_a 0
  of ww_amc { __DEFAULT ->
  GHC.Types.I# (GHC.Prim.+# ww_amc ww_amc)
  }

Consider 

let a = [1 .. 100000000] in length a + sum a

vs

length [1 .. 100000000] + sum [1 .. 100000000]

> >
> > The first will execute in constant space, since each list will be
> > lazily generated as needed by the length function and then the garbage
> > collector will come along behind length and get rid of the nodes that
> > have already been processed.  However, in the second expression, the
> > garbage collector cannot get rid of the nodes that are already
> > processed by the first call to length, since the second call to length
> > still needs the list.  So the entire list [1..1000000] will end up
> > being in memory at once.
>
> Hmm:
> > cFoldr :: (a -> b -> b) -> (a -> c -> c) -> (b, c) -> [a] -> (b, c)
> > cFoldr f g ~(b, c) []     = (b, c)
> > cFoldr f g ~(b, c) (x:xs) = let (b', c') = cFoldr f g (b, c) xs
> >                             in (f x b', g x c')
> >
> > cFoldl' :: (b -> a -> b) -> (c -> a -> c) -> (b, c) -> [a] -> (b, c)
> > cFoldl' f g bc xs0 = lgo bc xs0
> >                      where lgo (b, c) [] = (b, c)
> >                            lgo (b, c) (x:xs) =
> >                                let b' = f b x
> >                                    c' = g c x
> >                                    bc' = b' `seq` c' `seq` (b', c')
> >                                in bc' `seq` lgo bc' xs

No.

cFoldl' f g (b0,c0) xs0 = lgo b0 c0 xs0
    where
      lgo b c [] = (b,c)
      lgo !b !c (x:xs) = lgo (f b x) (g c x) xs

> >
> > lengthFold :: Int -> a -> Int
> > lengthFold !n _ = n + 1
> >
> > size = 100000000
> >
> >
> >
> > main = let a = [1..size]
> >            l = length a + length a
> >       in print $! l
>
> 200000000
>    8,031,190,480 bytes allocated in the heap

64-bit system? I get

200000000                                                          
   4,015,621,900 bytes allocated in the heap                       
         187,840 bytes copied during GC                            
           1,472 bytes maximum residency (1 sample(s))             
          29,892 bytes maximum slop                                
               1 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  7659 collections,     0 parallel,  0.10s,  0.09s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    3.38s  (  3.41s elapsed)
  GC    time    0.10s  (  0.09s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    3.48s  (  3.51s elapsed)

  %GC time       2.9%  (2.7% elapsed)

  Alloc rate    1,187,979,304 bytes per MUT second

  Productivity  97.1% of total user, 96.4% of total elapsed

>          741,264 bytes copied during GC
>            1,920 bytes maximum residency (1 sample(s))
>           28,216 bytes maximum slop
>                1 MB total memory in use (0 MB lost due to fragmentation)
>
>   Generation 0: 15318 collections,     0 parallel,  0.14s,  0.18s
> elapsed
>   Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s
> elapsed
>
>   INIT  time    0.00s  (  0.00s elapsed)
>   MUT   time    7.15s  (  7.26s elapsed)
>   GC    time    0.14s  (  0.18s elapsed)
>   EXIT  time    0.00s  (  0.00s elapsed)
>   Total time    7.29s  (  7.44s elapsed)
>
>   %GC time       1.9%  (2.5% elapsed)
>
>   Alloc rate    1,123,257,562 bytes per MUT second
>
>   Productivity  98.1% of total user, 96.1% of total elapsed
>
> ./a.out +RTS -s  7.29s user 0.05s system 98% cpu 7.450 total
>
> > main = print $! length [1..size] + length [1..size]
>
> 200000000
>   16,062,318,576 bytes allocated in the heap
>        1,476,904 bytes copied during GC
>            2,000 bytes maximum residency (1 sample(s))
>           27,992 bytes maximum slop
>                1 MB total memory in use (0 MB lost due to fragmentation)
>
>   Generation 0: 30637 collections,     0 parallel,  0.29s,  0.37s
> elapsed
>   Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s
> elapsed
>
>   INIT  time    0.00s  (  0.00s elapsed)
>   MUT   time   15.57s  ( 15.90s elapsed)
>   GC    time    0.29s  (  0.37s elapsed)
>   EXIT  time    0.00s  (  0.00s elapsed)
>   Total time   15.87s  ( 16.27s elapsed)
>
>   %GC time       1.8%  (2.3% elapsed)
>
>   Alloc rate    1,031,313,475 bytes per MUT second
>
>   Productivity  98.2% of total user, 95.7% of total elapsed
>
> ./a.out +RTS -s  15.87s user 0.11s system 98% cpu 16.272 total
>

And that is strange, because I get the same figures for that one as for the 
first (times differ by a few hundredths of a second).
Is that a difference between 32-bit code generator and 64-bit or between 
GHC versions (6.12.2 here, but 6.10.3 gives roughly the same results)?

> > main =
> >   print $! uncurry (+) (cFoldl' lengthFold lengthFold
> >                                 (0, 0) [1..size])

And that gives the same figures as the other two (plus/minus 0.05s).

>
> All are compiled with optimizations.

All compiled with -O2.