[Haskell-beginners] Re: [Haskell-begin] Maximum of a List?

Mon Jul 28 22:15:59 EDT 2008

Am Dienstag, 29. Juli 2008 01:53 schrieb Rafael Gustavo da Cunha Pereira 
Pinto:
> Since this is the "beginners" list, could someone explain me why using g
> made everything run like the wind, with almost no memory?

I'd rate "run like the wind" as an exaggeration since the completely 
unoptimised C version runs in 1.6 seconds vs. 54 seconds for the code below 
(49 seconds if using Int for the chain length) on my box. And the "g" isn't 
important for speed or memory, just for getting the correct result.

Two factors make it run in constant memory without stack overflow:
1. strictness
2. laziness.

The strictness of foldl' is the reason that every g k is evaluated without 
delay and can immediately be thrown away (unless it's the maximum so far, in 
which case the previous maximum can be discarded) and the laziness of map and 
enumFromTo (remember that [a .. b] is syntactic sugar for enumFromTo a b) 
allows the list to be consumed as it is generated.

Let us trace a few evaluation steps of answer:
answer
---> foldl' max (0,0) $ map g (enumFromTo 1 999999)
-- First, we must see whether the list is empty or not,
-- 1 <= 999999, so the list is not empty
---> foldl' max (0,0) $ map g (1 : enumFromTo (1+1) 999999)
---> foldl' max (0,0) $ g 1 : map g (enumFromTo (1+1) 999999)
---> foldl' max (max (0,0) (g 1)) $ map g (enumFromTo (1+1) 999999)
-- Now, due to the strictness of foldl', max (0,0) (g 1) has to be evaluated
-- far enough to know if it is _|_ or not. For that, we obviously have to 
-- compare (0,0) and g 1, hence we must evaluate g 1 far enough to see if
-- (0,0) <= g 1. Now g 1 = (f 1 1, 1) and we must check whether 0 <= f 1 1.
-- However, f 1 1 = 1, so yes, and max (0,0) (g 1) turns out to be (1,1)
---> foldl' max (1,1) $ map g (enumFromTo (1+1) 999999)
-- Now to decide if the list is empty, (1+1) must be compared to 999999,
-- for that it must be evaluated, so
---> foldl' max (1,1) $ map g (2 : enumFromTo (2+1) 999999)
---> foldl' max (1,1) $ g 2 : map g (enumFromTo (2+1) 999999)
---> foldl' max (max (1,1) (g 2)) $ map g (enumFromTo (2+1) 999999)
-- Is max (1,1) (g 2) _|_ or not? To find out evaluate g 2 = (2,2)
---> foldl' max (2,2) $ map g (enumFromTo (2+1) 999999)
---> foldl' max (2,2) $ map g (3 : enumFromTo (3+1) 999999)
---> foldl' max (2,2) $ g 3 : map g (enumFromTo (3+1) 999999)
---> foldl' max (max (2,2) (g 3)) $ map g (enumFromTo (3+1) 999999)
---> foldl' max (8,3) $ map g (enumFromTo (3+1) 999999)
...
---> foldl' max (max (8,3) (g 4)) $ map g (enumFromTo (4+1) 999999)
---> foldl' max (8,3) $ map g (enumFromTo (4+1) 999999)

and so on. Hardly any memory needed.
But you have more than 100,000,000 calls to f.
Now if you store previous results in a Map, you save many of these calls (to 
rank in your code), but you pay for it with memory usage (and a couple of 
million expensive insertions into the Map). For every entry in the Map, apart 
from the space for key and value, you also have an Int holding the size of 
the subtree and pointers to the two subtrees of the node, on a 32-bit box 
that's a theoretical minimal overhead of 12 bytes per entry, probably rather 
20 in reality, so a Map Integer Integer with one million entries would use 
some 30 MB memory. 

Hope that helps,
Daniel 

>
> I am puzzled! :-(
>
> On Mon, Jul 28, 2008 at 17:32, Steve Klabnik <steve.klabnik at gmail.com>wrote:
> > Finally, what I ended up doing:
> >
> > f :: Integer -> Integer -> Integer
> > f acc x
> >
> >     | x == 1 = acc
> >     |
> >         | even x = f (acc + 1) (x `div` 2)
> >         | otherwise = f (acc + 1) (3 * x + 1)
> >
> > g :: Integer -> (Integer, Integer)
> > g x = (f 1 x, x)
> >
> > answer = (foldl' max (0,0)) $ map g [1 .. 999999]
> >
> >
> > main = putStrLn( show answer)