# Re to `optimization for list'

Serge D. Mechveliani mechvel at botik.ru
Sun Aug 2 05:24:01 EDT 2009

```Joshua Haberman <joshua at reverberate.org>  writes on  2 Aug 2009

> Hello, I'm quite new to Haskell, but experienced in other languages (C,
> Python, Ruby, SQL, etc).  I am interested in Haskell because I've heard
> that the language is capable of lots of optimizations based on laziness,
>
> I dug in with Project Euler problem #1, and wrote:
>
> main =
>  print (show (sum [x | x <- [3..999], x `mod` 3 == 0 || x `mod` 5 == 0]))
>
> So far so good, but I want to have some way of observing what
> optimizations GHC has performed.  Most notably, in this example I want
> to know if the list was ever actually constructed in memory.  The "sum"
> function only needs the elements one at a time, in order, so if Haskell
> and GHC are everything I've heard about them, I would fully expect the
> list construction to be optimized out.  :)
> [..]

complex definition in the Haskell library.
Consider first a more pure example:

main = putStr (shows (or' ys) "\n")
where
n  = 10^8
ys = replicate n False

or' :: [Bool] -> Bool
or' []      = False
or' (x: xs) = x || (or' xs)

(the functions `||' (disjunction) and `replicate' are of the library).
Compile it with   ghc --make -O  (for any occasion),
and then, it will be evident that it takes a constant in  n  space
(and stack) when running  (run it like this:   `./T +RTS -M1m -RTS' ).

I think, this is  not due to optimization  but due to the very definition
Compute this by hand:

or' (replicate 9 False) =
or' (False: (replicate 8 False)) =
False || (or' (replicate 8 False)) =
or' (replicate 8 False) =               -- by definition of ||
...
or' [] =
False.

This is just the definition of evaluation in the Haskell language.
In this example, the list under  or'  never expands further than for 2
members.

Consider a more complex example (and more close to your example):

main = putStr (shows (sum2 ys) "\n")
-- sum'
where
n  = 10^7
ys = [1 .. n]

sum2 :: [Int] -> Int
sum2 xs = sm xs 0   where  sm []      s = s
sm (x: xs) s = sm xs (s+x)
sum' :: [Int] -> Int
sum' []      = 0
sum' (x: xs) = x + (sum' xs)

(the function [n .. m] is of the library).
Compile it with   ghc --make -O,
and then, it will be evident that  sum2  takes a constant in  n  space
(and stack) when running  (run it like this:   `./T +RTS -M1m -RTS' ).
Let us investigate this.
Furst, compute this by hand with treating (+) as not of the library:

sm [1 .. 4] 0 =
sm (1:[2 .. 4]) 0 =
sm [2 .. 4] (1+0) =
sm (2: [3 .. 4]) (1+0) =
sm [3 .. 4] (2+(1+0)) =
...

It must accumulate an intermediate stack of size  O(n).
Why ghc-6.10.4 computes it in a constant space (and stack) ?

The following is my guess.

In the  Haskell-98 library  it is defined that (+) is  strict  on the
type of  Int  (right?).
Therefore, the compiler optimizes (1+0) by replacing it immediately with
1. Taking in account that all other parts are computed lazily, the
evaluation of the  compiled program  is like this:

sm [1 .. 4] 0 =
sm (1:[2 .. 4]) 0 =
sm [2 .. 4] (1+0) =
sm [2 .. 4] 1 =
sm (2: [3 .. 4]) 1 =
sm [3 .. 4] (2+1) =
sm [3 .. 4] 3 =
...
So, it is the result of combining of the evaluation definition with
a certain compiler's optimization related to the strictness info for
Int.(+).
Right?

As to  sum',  it still takes O(n) space -- in  ghc-6.10.4.
I think, this is because it is difficult for a compiler to guess how
strictness of  Int.(+)  may correlate to the definition of  sum'.
Probably, it needs some kind of inductive reasoning about sum' to
optimize it (for example) into  sum2.
Right ?

Regards,

-----------------
Serge Mechveliani
mechvel at botik.ru
```