Sat Nov 12 20:21:19 CET 2011

```On Saturday 12 November 2011, 20:36:04, Artyom Kazak wrote:
> Hello!
>
> The following program executes 1.5 seconds on my computer:
>
> -----------------------CODE BEGIN-------------------------
>      module Main where
>
>      import Data.Array.IArray
>
>      main = print (answers ! 1000000)
>
>      nextAns :: (Int, Int, Float) -> (Int, Int, Float)
>      nextAns (a, n, r) = if r2 > 1 then (a+1, n+2, r2) else (a+1, n+3,
> r3) where
>          a' = fromIntegral a
>          n' = fromIntegral n
>          r2 = r * (a'/(a'+1))**n' * (n'+1)*(n'+2)/(a'+1)^2
>          r3 = r2 * (n'+3) / (a'+1)
>
>      answers :: Array Int Int
>      answers = listArray (1, 1000000) (map snd3 \$ iterate nextAns (1, 2,
> 2)) where snd3 (a, b, c) = b
> ------------------------CODE END--------------------------

Can't reproduce. The IArray version needs more than 16M of stack here (16M
wasn't enough, 32M was), that gives a hint.
IArray took 0.20s MUT and 0.38s GC, UArray took 0.19s MUT.

But of course, I compiled with optimisations, which you apparently didn't.

However, compiling without optimisations for the sake of investigation, I
get numbers closer to yours, yet still distinct enough.

UArray took 1.28s MUT, 0.02s GC, that corresponds pretty well to your
result.
IArray took 1.32s MUT and 0.56s GC. [*]
So that conforms with my -O2 results, UArray is a wee bit faster in the
calculation, the big difference is GC, but not with your results.

[*] That was with 7.2.2, I tried also with 7.0.4, that made no difference
for UArray, but for the boxed array:

MUT   time    1.31s  (  1.31s elapsed)
GC    time   21.31s  ( 21.34s elapsed)

Ouch!

>
>  From these 1.5 seconds, 1 second is spent on doing GC. If I run it with
> "-A200M", it executes for only 0.5 seconds (total).
>
> Which is more interesting, when I use UArray instead of Array, it spends
> only 0.02 seconds in GC, but total running time is still 1.5 seconds.
>
> Why are... these things?

If you're using a boxed array, you
- are building a long list of thunks with iterate (no strictness, so
nothing is evaluated)
- are then writing the thunks to the boxed array (actually, this is
interleaved with the construction)
- finally evaluate the last thunk, which forces the previous thunks,
peeling layers off the thunk, pushing them on the stack until the start is
reached, then popping the layers and evaluating the next term.

You get a huge thunk that takes long to garbage-collect when it finally can
be collected.

Using an unboxed array, you have to write the *values* to the array as it
is constructed, that forces evaluation of the iterate-generated tuples
immediately, hence no big thunk is built and the small allocations can very
quickly be collected.

```