[Haskell-beginners] Re: When, if ever,
does Haskell "calculate once"?
Daniel Fischer
daniel.is.fischer at web.de
Thu May 6 17:46:50 EDT 2010
On Thursday 06 May 2010 22:49:40, Maciej Piechotka wrote:
> On Thu, 2010-05-06 at 15:37 -0400, Brent Yorgey wrote:
> > On Thu, May 06, 2010 at 11:35:15AM -0700, Travis Erdman wrote:
> > > Two questions here, I think they might be related, perhaps even the
> >
> > same, but I am not sure, so I will ask both:
> > > Q1: Re the function f below, I like the first implementation as
> >
> > it's "cleaner", but is the second implementation necessary for
> > performance purposes?
> >
> > > -- g = some CPU intensive function
> > >
> > > -- alternate 1
> > > f a b = c + (g b)
> > > where
> > > c = dosomethingelse a (g b)
> > >
> > > -- alternate 2
> > > f a b = c + saveit
> > > where
> > > saveit = g b
> > > c = dosomethingelse a saveit
> >
> > You need alternative 2. In general, GHC (and, I imagine, other
> > Haskell compilers) do not do much common subexpression elimination,
> > because in some cases it can be a *pessimization*. The classic
> > example is
> >
> > length [1..1000000] + length [1..1000000]
> >
> > vs
> >
> > let a = [1..1000000] in length a + length a
Not a particularly fortunate example, with optimisations turned on, GHC
shares the common subexpression (length a), so that's calculated only once:
share :: Int
share = let a = [1 .. 100000000] in length a + length a
Core:
Share.share :: GHC.Types.Int
GblId
[Str: DmdType]
Share.share =
case GHC.List.$wlen @ GHC.Integer.Type.Integer Share.share_a 0
of ww_amc { __DEFAULT ->
GHC.Types.I# (GHC.Prim.+# ww_amc ww_amc)
}
Consider
let a = [1 .. 100000000] in length a + sum a
vs
length [1 .. 100000000] + sum [1 .. 100000000]
> >
> > The first will execute in constant space, since each list will be
> > lazily generated as needed by the length function and then the garbage
> > collector will come along behind length and get rid of the nodes that
> > have already been processed. However, in the second expression, the
> > garbage collector cannot get rid of the nodes that are already
> > processed by the first call to length, since the second call to length
> > still needs the list. So the entire list [1..1000000] will end up
> > being in memory at once.
>
> Hmm:
> > cFoldr :: (a -> b -> b) -> (a -> c -> c) -> (b, c) -> [a] -> (b, c)
> > cFoldr f g ~(b, c) [] = (b, c)
> > cFoldr f g ~(b, c) (x:xs) = let (b', c') = cFoldr f g (b, c) xs
> > in (f x b', g x c')
> >
> > cFoldl' :: (b -> a -> b) -> (c -> a -> c) -> (b, c) -> [a] -> (b, c)
> > cFoldl' f g bc xs0 = lgo bc xs0
> > where lgo (b, c) [] = (b, c)
> > lgo (b, c) (x:xs) =
> > let b' = f b x
> > c' = g c x
> > bc' = b' `seq` c' `seq` (b', c')
> > in bc' `seq` lgo bc' xs
No.
cFoldl' f g (b0,c0) xs0 = lgo b0 c0 xs0
where
lgo b c [] = (b,c)
lgo !b !c (x:xs) = lgo (f b x) (g c x) xs
> >
> > lengthFold :: Int -> a -> Int
> > lengthFold !n _ = n + 1
> >
> > size = 100000000
> >
> >
> >
> > main = let a = [1..size]
> > l = length a + length a
> > in print $! l
>
> 200000000
> 8,031,190,480 bytes allocated in the heap
64-bit system? I get
200000000
4,015,621,900 bytes allocated in the heap
187,840 bytes copied during GC
1,472 bytes maximum residency (1 sample(s))
29,892 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 7659 collections, 0 parallel, 0.10s, 0.09s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 3.38s ( 3.41s elapsed)
GC time 0.10s ( 0.09s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 3.48s ( 3.51s elapsed)
%GC time 2.9% (2.7% elapsed)
Alloc rate 1,187,979,304 bytes per MUT second
Productivity 97.1% of total user, 96.4% of total elapsed
> 741,264 bytes copied during GC
> 1,920 bytes maximum residency (1 sample(s))
> 28,216 bytes maximum slop
> 1 MB total memory in use (0 MB lost due to fragmentation)
>
> Generation 0: 15318 collections, 0 parallel, 0.14s, 0.18s
> elapsed
> Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s
> elapsed
>
> INIT time 0.00s ( 0.00s elapsed)
> MUT time 7.15s ( 7.26s elapsed)
> GC time 0.14s ( 0.18s elapsed)
> EXIT time 0.00s ( 0.00s elapsed)
> Total time 7.29s ( 7.44s elapsed)
>
> %GC time 1.9% (2.5% elapsed)
>
> Alloc rate 1,123,257,562 bytes per MUT second
>
> Productivity 98.1% of total user, 96.1% of total elapsed
>
> ./a.out +RTS -s 7.29s user 0.05s system 98% cpu 7.450 total
>
> > main = print $! length [1..size] + length [1..size]
>
> 200000000
> 16,062,318,576 bytes allocated in the heap
> 1,476,904 bytes copied during GC
> 2,000 bytes maximum residency (1 sample(s))
> 27,992 bytes maximum slop
> 1 MB total memory in use (0 MB lost due to fragmentation)
>
> Generation 0: 30637 collections, 0 parallel, 0.29s, 0.37s
> elapsed
> Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s
> elapsed
>
> INIT time 0.00s ( 0.00s elapsed)
> MUT time 15.57s ( 15.90s elapsed)
> GC time 0.29s ( 0.37s elapsed)
> EXIT time 0.00s ( 0.00s elapsed)
> Total time 15.87s ( 16.27s elapsed)
>
> %GC time 1.8% (2.3% elapsed)
>
> Alloc rate 1,031,313,475 bytes per MUT second
>
> Productivity 98.2% of total user, 95.7% of total elapsed
>
> ./a.out +RTS -s 15.87s user 0.11s system 98% cpu 16.272 total
>
And that is strange, because I get the same figures for that one as for the
first (times differ by a few hundredths of a second).
Is that a difference between 32-bit code generator and 64-bit or between
GHC versions (6.12.2 here, but 6.10.3 gives roughly the same results)?
> > main =
> > print $! uncurry (+) (cFoldl' lengthFold lengthFold
> > (0, 0) [1..size])
And that gives the same figures as the other two (plus/minus 0.05s).
>
> All are compiled with optimizations.
All compiled with -O2.
More information about the Beginners
mailing list