[Haskell-cafe] Strange GC timings

Sat Nov 12 20:21:19 CET 2011

On Saturday 12 November 2011, 20:36:04, Artyom Kazak wrote:
> Hello!
> 
> The following program executes 1.5 seconds on my computer:
> 
> -----------------------CODE BEGIN-------------------------
>      module Main where
> 
>      import Data.Array.IArray
> 
>      main = print (answers ! 1000000)
> 
>      nextAns :: (Int, Int, Float) -> (Int, Int, Float)
>      nextAns (a, n, r) = if r2 > 1 then (a+1, n+2, r2) else (a+1, n+3,
> r3) where
>          a' = fromIntegral a
>          n' = fromIntegral n
>          r2 = r * (a'/(a'+1))**n' * (n'+1)*(n'+2)/(a'+1)^2
>          r3 = r2 * (n'+3) / (a'+1)
> 
>      answers :: Array Int Int
>      answers = listArray (1, 1000000) (map snd3 $ iterate nextAns (1, 2,
> 2)) where snd3 (a, b, c) = b
> ------------------------CODE END--------------------------

Can't reproduce. The IArray version needs more than 16M of stack here (16M 
wasn't enough, 32M was), that gives a hint.
IArray took 0.20s MUT and 0.38s GC, UArray took 0.19s MUT.

But of course, I compiled with optimisations, which you apparently didn't.

However, compiling without optimisations for the sake of investigation, I 
get numbers closer to yours, yet still distinct enough.

UArray took 1.28s MUT, 0.02s GC, that corresponds pretty well to your 
result.
IArray took 1.32s MUT and 0.56s GC. [*]
So that conforms with my -O2 results, UArray is a wee bit faster in the 
calculation, the big difference is GC, but not with your results.

[*] That was with 7.2.2, I tried also with 7.0.4, that made no difference 
for UArray, but for the boxed array:

  MUT   time    1.31s  (  1.31s elapsed)
  GC    time   21.31s  ( 21.34s elapsed)

Ouch!

> 
>  From these 1.5 seconds, 1 second is spent on doing GC. If I run it with
> "-A200M", it executes for only 0.5 seconds (total).
> 
> Which is more interesting, when I use UArray instead of Array, it spends
> only 0.02 seconds in GC, but total running time is still 1.5 seconds.
> 
> Why are... these things?

If you're using a boxed array, you
- are building a long list of thunks with iterate (no strictness, so 
nothing is evaluated)
- are then writing the thunks to the boxed array (actually, this is 
interleaved with the construction)
- finally evaluate the last thunk, which forces the previous thunks, 
peeling layers off the thunk, pushing them on the stack until the start is 
reached, then popping the layers and evaluating the next term.

You get a huge thunk that takes long to garbage-collect when it finally can 
be collected.

Using an unboxed array, you have to write the *values* to the array as it 
is constructed, that forces evaluation of the iterate-generated tuples 
immediately, hence no big thunk is built and the small allocations can very 
quickly be collected.