[Haskell-cafe] Question about memory usage

Wed Aug 18 09:52:00 EDT 2010

On Tuesday 17 August 2010 17:33:15, Sebastian Fischer wrote:
> > BTW, what sort of memory usage are we talking about here?

>      ./calcfib 100000000 +RTS -s
>           451,876,020 bytes allocated in the heap
>                10,376 bytes copied during GC
>            19,530,184 bytes maximum residency (9 sample(s))
>            12,193,760 bytes maximum slop
>                    97 MB total memory in use

That's compatible with the figures I got for a similar programme.
I just wanted to make sure you had no much higher usage.

>
> I'm not sure how to interpret "bytes allocated" and "maximum
> residency" especially because the program spends no time during GC.
> But the 97 MB "total memory" correspond to what my process monitor
> shows.
>

The bytes allocated figure says how many bytes the RTS allocated in total 
during the run of the programme. It's not closely related to memory 
requirements, for example if you have an almost tight loop, which just has 
too many variables to keep them all in registers, you'll get high 
allocation figures for long running loops even though it runs in small 
constant space (of course, allocating, say, one Int on the heap in each 
iteration costs performance versus an entirely-in-registers-loop).

The maximum residency, iirc, is the maximal size of live heap the garbage 
collector finds in any of the major GCs (there were 9, but since there were 
only few items, all 9 major and 40 minor GCs together took too little time 
to display a positive value). 19.5MB could be the one final Fibonacci 
number plus three of about half the size, that seems a reasonable value 
(and if the GC ran at slightly different times, you might have gotten 
different values).

The total memory in use also includes garbage waiting to be collected.
I don't remember any details fo GHC's GC algorithm, but having total memory 
about five times maximum residency doesn't seem unbelievable.

> > I have now tried your code and I didn't find the memory usage too
> > extreme.
> > Be aware that fib (10^8) requires about 70 million bits and you need
> > several times that for the computation.
>
> Then, roughly ten 70m bit numbers fit into the total memory used:
>
>      ghci> (97 * 1024^2 * 8) / (70 * 10^6)
>      11.624213942857143
>
> I expected to retain about three of such large numbers, not ten, and
> although the recursion depth is not deep I expected some garbage. Both
> expectations may be a mistake, of course.
>

Yes, the memory usage is higher than I'd expect too, but it's well-behaved 
(sublinear growth), so I'm not overly concerned.

> > If the relation is
> >
> > a_n = c_1*a_(n-1) + ... + c_k*a_(n-k)
> >
> > you have the k×k matrix [...]
> > to raise to the n-th power, multiply the resulting matrix with the
> > vector
> > of initial values (a_(k-1), a_(k-2), ..., a_0) and take the last
> > component
>
> Interesting.
>
> > (a propos of this, your Fibonacci numbers are off by one, fib 0 = 0,
> > fib 9
> > = 34, fib 10 = 55).
>
> Wikipedia also starts with zero but states that "some sources" omit
> the leading zero. I took the liberty to do the same (and documented it
> like this) as it seems to match the algorithm better. It also
> corresponds to the rabbit analogy.
>

It breaks however the nice properties that

fib n = (phi^n - (1-phi)^n)/sqrt 5
fib n `mod` 5 == 0 <=> n `mod` 5 == 0
p prime => fib (p - (p|5)) `mod` p == 0
fib n = (-1)^(n+1)*fib (-n)

[where, due to typographical limitations, I wrote the Legendre symbol as 
(p|q)]

For aesthetical reasons, I continue to regard setting fib 0 = 1 as wrong.

> > These matrices have a special structure
> > that allows doing a multiplication in O(k^2).
> > You might want to look into the Cayley-Hamilton theorem for the
> > latter.
>
> I don't see the link to the CH theorem yet -- probably because I
> didn't know it.

Cayley-Hamilton allows you to replace the matrix exponentiation by 
calculating (X^n) modulo the characteristic polynomial of M and thus matrix 
multiplication by multiplication of polynomials (of degree <= (k-1)).
The link is pretty indirect, however, and Cayley-Hamilton/multiplication of 
polynomials is the endpoint of a twisted path.

> I did observe that all matrices during the computation
> have the form
>
>      /a b\
>      \b c/
>
> which simplifies multiplications.

Yes, the powers of a symmetric matrix are always symmetric.
That follows from

trans(A·B) = trans(B)·trans(A), so

trans(A^n) = (trans(A))^n

which, if A is symmetric, i.e. trans(A) = A, reduces to A^n.

> Is this related to CH?

No, entirely unrelated. It's the fact that the coefficient of fib (n-2) is 
1 in the recursion formula for fib n.

> Or can I
> further improve the multiplications using insights from CH?

It's not worth it for Fibonacci numbers: for small k, an O(k^3) algorithm 
is better than an O(k^2) algorithm with larger constant factors.

If you had a linear recursion involving say a_(n-1000) it'd be a huge gain.