Re to `optimization for list'
Serge D. Mechveliani
mechvel at botik.ru
Sun Aug 2 05:24:01 EDT 2009
Joshua Haberman <joshua at reverberate.org> writes on 2 Aug 2009
> Hello, I'm quite new to Haskell, but experienced in other languages (C,
> Python, Ruby, SQL, etc). I am interested in Haskell because I've heard
> that the language is capable of lots of optimizations based on laziness,
> and I want to learn more about that.
>
> I dug in with Project Euler problem #1, and wrote:
>
> main =
> print (show (sum [x | x <- [3..999], x `mod` 3 == 0 || x `mod` 5 == 0]))
>
> So far so good, but I want to have some way of observing what
> optimizations GHC has performed. Most notably, in this example I want
> to know if the list was ever actually constructed in memory. The "sum"
> function only needs the elements one at a time, in order, so if Haskell
> and GHC are everything I've heard about them, I would fully expect the
> list construction to be optimized out. :)
> [..]
Your example is rather complex to start with, because `sum' has rather a
complex definition in the Haskell library.
Consider first a more pure example:
main = putStr (shows (or' ys) "\n")
where
n = 10^8
ys = replicate n False
or' :: [Bool] -> Bool
or' [] = False
or' (x: xs) = x || (or' xs)
(the functions `||' (disjunction) and `replicate' are of the library).
Compile it with ghc --make -O (for any occasion),
and then, it will be evident that it takes a constant in n space
(and stack) when running (run it like this: `./T +RTS -M1m -RTS' ).
I think, this is not due to optimization but due to the very definition
of evaluation in Haskell -- Head Normal Form.
Compute this by hand:
or' (replicate 9 False) =
or' (False: (replicate 8 False)) =
False || (or' (replicate 8 False)) =
or' (replicate 8 False) = -- by definition of ||
...
or' [] =
False.
This is just the definition of evaluation in the Haskell language.
In this example, the list under or' never expands further than for 2
members.
Consider a more complex example (and more close to your example):
main = putStr (shows (sum2 ys) "\n")
-- sum'
where
n = 10^7
ys = [1 .. n]
sum2 :: [Int] -> Int
sum2 xs = sm xs 0 where sm [] s = s
sm (x: xs) s = sm xs (s+x)
sum' :: [Int] -> Int
sum' [] = 0
sum' (x: xs) = x + (sum' xs)
(the function [n .. m] is of the library).
Compile it with ghc --make -O,
and then, it will be evident that sum2 takes a constant in n space
(and stack) when running (run it like this: `./T +RTS -M1m -RTS' ).
Let us investigate this.
Furst, compute this by hand with treating (+) as not of the library:
sm [1 .. 4] 0 =
sm (1:[2 .. 4]) 0 =
sm [2 .. 4] (1+0) =
sm (2: [3 .. 4]) (1+0) =
sm [3 .. 4] (2+(1+0)) =
...
It must accumulate an intermediate stack of size O(n).
Why ghc-6.10.4 computes it in a constant space (and stack) ?
The following is my guess.
And let the Haskell experts correct me, please.
In the Haskell-98 library it is defined that (+) is strict on the
type of Int (right?).
Therefore, the compiler optimizes (1+0) by replacing it immediately with
1. Taking in account that all other parts are computed lazily, the
evaluation of the compiled program is like this:
sm [1 .. 4] 0 =
sm (1:[2 .. 4]) 0 =
sm [2 .. 4] (1+0) =
sm [2 .. 4] 1 =
sm (2: [3 .. 4]) 1 =
sm [3 .. 4] (2+1) =
sm [3 .. 4] 3 =
...
So, it is the result of combining of the evaluation definition with
a certain compiler's optimization related to the strictness info for
Int.(+).
Right?
As to sum', it still takes O(n) space -- in ghc-6.10.4.
I think, this is because it is difficult for a compiler to guess how
strictness of Int.(+) may correlate to the definition of sum'.
Probably, it needs some kind of inductive reasoning about sum' to
optimize it (for example) into sum2.
Right ?
Regards,
-----------------
Serge Mechveliani
mechvel at botik.ru
More information about the Glasgow-haskell-users
mailing list