# [Haskell-cafe] GHC magic optimization ?

Matt Morrow moonpatio at gmail.com
Fri Dec 4 19:51:17 EST 2009

```Fixing my errors:

> x = sum [1..10^6] + product [1..10^6]
> x' = let l = [1..10^6] in sum l + product l

-- Define:
bar m n = foo (enumFromTo m n)
foo xs  = sum xs + prod xs

-- We're given:
sum = foldl (+) 0
product = foldl (*) 1
foldl f z xs =
case xs of
[] -> []
x:xs -> foldl f (f z x) xs
enumFromTo m n =
case n < m of
True -> []
False -> m : enumFromTo (m+1) n

-- The fused loop becomes:
foo xs = go0 0 1 xs
where go0 a b xs =
case xs of
[] -> a+b
x:xs -> go0 (a+x) (b*x) xs

-- Now inline foo in bar:
bar m n = go2 0 1 m n
where go2 a b m n = go0 a b (go1 m n)
go0 a b xs =
case xs of
[] -> a+b
x:xs -> go0 (a+x) (b*x) xs
go1 m n =
case m < n of
True -> []
False -> m : go1 (m+1) n

-- considering go2
go2 a b m n = go0 a b (go1 m n)

==> case (go1 m n) of
[] -> a+b
x:xs -> go0 (a+x) (b*x) xs

==> case (case n < m of
True -> []
False -> m : go1 (m+1) n) of
[] -> a+b
x:xs -> go0 (a+x) (b*x) xs

==> case n < m of
True -> case [] of
[] -> a+b
x:xs -> go0 (a+x) (b*x) xs

False -> case (m : go1 (m+1) n) of
[] -> a+b
x:xs -> go0 (a+x) (b*x) xs

==> case n < m of
True -> a+b
False -> go0 (a+m) (b*m) (go1 (m+1) n)

-- So,
go2 a b m n =
case n < m of
True -> a+b
False -> go0 (a+m) (b*m) (go1 (m+1) n)

-- And by the original def of go2
go2 a b m n = go0 a b (go1 m n)

-- We get
go2 a b m n =
case m < n of
True -> a+b
False -> go2 (a+m) (b*m) (m+1) n

-- go0 and go1 and now dead in bar
bar m n = go2 0 1 m n
where go2 a b m n =
case n < m of
True -> a+b
False -> go2 (a+m) (b*m) (m+1) n

-- (furthermore, if (+) here is for Int/Double etc,
-- we can reduce go2 further to operate on machine
-- ints/doubles and be a register-only non-allocating loop)

-- So now finally returning to our original code:
> x = sum [1..10^6] + product [1..10^6]
> x' = let l = [1..10^6] in sum l + product l

-- We get:
x' = bar 1 (10^6)

Matt

On 12/4/09, Matt Morrow <moonpatio at gmail.com> wrote:
> Although, in Luke's example,
>
>> x = sum [1..10^6] + product [1..10^6]
>> x' = let l = [1..10^6] in sum l + product l
>
> We can do much much better, if we're sufficiently smart.
>
> -- Define:
> bar m n = foo (enumFromTo m n)
> foo xs  = sum xs + prod xs
>
> -- We're given:
> sum = foldl (+) 0
> product = foldl (*) 1
> foldl f z xs =
>   case xs of
>     [] -> []
>     x:xs -> foldl f (f z x) xs
> enumFromTo m n =
>   case m < n of
>     True -> []
>     False -> m : enumFromTo (m+1) n
>
> -- The fused loop becomes:
> foo xs = go0 0 1 xs
>   where go0 a b xs =
>           case xs of
>             [] -> a+b
>             x:xs -> go0 (a+x) (b*x) xs
>
> -- Now inline foo in bar:
> bar m n = go2 0 1 m n
>   where go2 = go0 a b (go1 m n)
>         go0 a b xs =
>           case xs of
>             [] -> a+b
>             x:xs -> go0 (a+x) (b*x) xs
>         go1 m n =
>           case m < n of
>             True -> []
>             False -> m : go1 (m+1) n
>
> -- considering go2
> go2 = go0 a b (go1 m n)
>
>     ==> case (go1 m n) of
>           [] -> a+b
>            x:xs -> go0 (a+x) (b*x) xs
>
>     ==> case (case m < n of
>                True -> []
>                False -> m : go1 (m+1) n) of
>           [] -> a+b
>           x:xs -> go0 (a+x) (b*x) xs
>
>     ==> case m < n of
>           True -> case [] of
>                     [] -> a+b
>                     x:xs -> go0 (a+x) (b*x) xs
>
>           False -> case (m : go1 (m+1) n) of
>                      [] -> a+b
>                      x:xs -> go0 (a+x) (b*x) xs
>
>     ==> case m < n of
>           True -> a+b
>           False -> go0 (a+m) (b*m) (go1 (m+1) n)
>
> -- So,
> go2 = case m < n of
>         True -> a+b
>         False -> go0 (a+m) (b*m) (go1 (m+1) n)
>
> -- And by the original def of go2
> go2 = go0 a b (go1 m n)
>
> -- We get
> go2 = case m < n of
>         True -> a+b
>         False -> go2 (a+m) (b*m) (m+1) n
>
> -- go0 and go1 and now dead in bar
> bar m n = go2 0 1 m n
>   where go2 = case m < n of
>                 True -> a+b
>                 False -> go2 (a+m) (b*m) (m+1) n
>
> -- (furthermore, if (+) here is for Int/Double etc,
> -- we can reduce go2 further to operate on machine
> -- ints/doubles and be a register-only non-allocating loop)
>
> -- So now finally returning to our original code:
>> x = sum [1..10^6] + product [1..10^6]
>> x' = let l = [1..10^6] in sum l + product l
>
> -- We get:
> x' = bar 1 (10^6)
>
> And the intermediate list never exists at all.
>
> Matt
>
>
>
>
> On 12/4/09, Luke Palmer <lrpalmer at gmail.com> wrote:
>> On Fri, Dec 4, 2009 at 3:36 AM, Neil Brown <nccb2 at kent.ac.uk> wrote:
>>> But let's say you have:
>>>
>>> g x y = f x y * f x y
>>>
>>> Now the compiler (i.e. at compile-time) can do some magic.  It can spot
>>> the
>>> common expression and know the result of f x y must be the same both
>>> times,
>>> so it can convert to:
>>>
>>> g x y = let z = f x y in z * z
>>
>> GHC does *not* do this by default, quite intentionally, even when
>> optimizations are enabled.  The reason is because it can cause major
>> changes in the space complexity of a program.  Eg.
>>
>> x = sum [1..10^6] + product [1..10^6]
>> x' = let l = [1..10^6] in sum l + product l
>>
>> x runs in constant space, but x' keeps the whole list in memory.  The
>> CSE here has actually wasted both time and space, since it is harder
>> to save [1..10^6] than to recompute it!  (Memory vs. arithmetic ops)
>>
>> So GHC leaves it to the user to specify sharing.  If you want an
>> expression shared, let bind it and reuse.
>>
>> Luke
>> _______________________________________________