Memory usage

Sat, 25 Nov 2000 18:19:05 +0000

In message <3A1D3124.933312D8@dcse.fee.vutbr.cz>, Dusan Kolar
<kolar@dcse.fee.vutbr.cz> writes

 >Hello,
 >
 >  working on a small utility for de/compression of special files
 >using special algorithm (useless in other cases and not important
 >for the problem) I've encountered I cannot decompress even small
 >files because I always run out of the memory.
 >
 >  After some time I've found the problem. It can be simplified
 >to the following function (computing nothing reasonable):

Dusan's function, (reformatted - this message is a hugs script,
unless the lines get wrapped by your email) is:

> tst :: [a] -> [a] -> [a] -> [a]
> tst l []     = id
> tst l (x:xs) = if null l then ((++) [x]) . (tst (l++[x]) xs) 
>                 else if null (tail l) then ((++) [x,last l]) . 
>                         (tst (tail l++[last l]++[x]) xs)
>                 else ((++) [x,last l]) . 
>                         (tst (tail (tail l)++[last l]++[x]) xs)

 >Trying to run it in hugs (default settings) this way:
 >
 > tst [] [1..20000] []
 >
 >the program stops writing almost all the lines and running out of
 >memory.

From the example you give, I assume that the 'purpose' 
of tst is to compute p  efficiently, where

> p :: [a] -> [a]
> p xs = tst [] xs []

  p [1,2]     = [1,2,1]
  p [1,2,3]   = [1,2,1,3,2]
  p [1,2,3,4] = [1,2,1,3,2,4,3]

It looks as though this function can be computed in 
constant space, but p leaks space. I check this by 
starting hugs with a small heap:
  hugs +g -h30k
Then:

Main> last $ p [1..10000]
{{Gc:17584}}{{Gc:15496}}{{Gc:13652}}{{Gc:12036}}{{Gc:10608}}{{Gc:9352}}
{{Gc:8246 }}{{Gc:7258}}{{Gc:6396}}{{Gc:5644}}{{Gc:4964}}{{Gc:4379}}
{{Gc:3865}}{{Gc:3404}}{{Gc:3002}}{{Gc:2642}}{{Gc:2340}}{{Gc:2052}}
{{Gc:1816}}{{Gc:1592}}{{Gc:1404}}{{Gc:1248}}{{Gc:1092}}{{Gc:960}}
(89416 reductions, 159870 cells, 23 garbage collections)
{{Gc:960}}ERROR: Garbage collection fails to reclaim sufficient space

The tst function is 'equivalent' to tst1, which 
is derived by splitting out the cases and simplifying.
I prefer this style!

> tst1:: [a] -> [a] -> [a] -> [a]
> tst1 _       []     b = b
> tst1 []      (x:xs) b = [x] ++ tst1 [x] xs b
> tst1 [y]     (x:xs) b = [x,y] ++ tst1 [y,x] xs b
> tst1 [y,z]   (x:xs) b = [x,z] ++ tst1 [z,x] xs b
> tst1 (y:z:m) (x:xs) b = [x,n] ++ tst1 (m ++ [n,x]) xs b
>                           where n = last m

> p1 :: [a] -> [a]
> p1 xs = tst1 [] xs []

Also, p1 *doesn't leak space*. Why? I really haven't much
idea: it may be because there is only one 'last m', but a 
better guess could be that tst is a bit too complicated!

Main> last $ p1 [1..10000]
{{Gc:19934}}{{Gc:19932}}{{Gc:19932}}{{Gc:19932}}{{Gc:19932}}{{Gc:19932}}
{{Gc:19932}}{{Gc:19932}}{{Gc:19932}}{{Gc:19932}}{{Gc:19932}}{{Gc:19932}}
{{Gc:19932}}{{Gc:19932}}{{Gc:19932}}{{Gc:19932}}{{Gc:19932}}{{Gc:19932}}
{{Gc:19932}}{{Gc:19932}}{{Gc:19932}}{{Gc:19932}}{{Gc:19932}}9999
(240028 reductions, 440040 cells, 23 garbage collections)
{{Gc:19967}}Main>

The following 'specification' of p uses lots of space:
> p0 :: [a] -> [a] 
> p0 []  = []
> p0 [x] = [x]
> p0 xs  = p0 (init xs) ++ [last xs, last $ init xs]

Another 'implementation' using constant space is:
> p2 :: [a] -> [a]
> p2 []        = []
> p2 [x]       = [x]
> p2 xs@(x:ys) = (x:merge ys xs)
>   where
>     merge []     _      = []
>     merge (a:as) (b:bs) = (a:b:merge as bs)

There have been lots of messages on this list concerning
space leaks in Haskell. It seems that only compiler writers
can understand the causes of subtle space leaks.  This isn't 
really very satisfactory.

Hope this helps.
-- 
William Marsh