RTS/Garbage Collector idea

Isaac Dupree isaacdupree at charter.net
Mon Jun 18 06:45:47 EDT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

It's often bothered me that there is unneeded duplication in the heap,
if you do something the compiler doesn't manage to optimize like
f [a,b,c,d] = Foo bar [a,b,c,d]
or
(reverse . reverse)
or using () or Int(eger) or ... or Bool boxes in multiple places in your
program that have the same value.  A referentially transparent
difference in your program could be the difference between constant
memory usage and a memory leak (I think -- although this proposal/idea
will not "tie the knot" for you or deal well with cyclic data structures
at all, in fact).

I was thinking, since we have a copying garbage collector that reassigns
all pointers anyway, could we detect the "identical" heap objects and
for each set of identical ones, have all thunks that pointed to any of
them, now point to the single one that is created in the target space?

This might not have much effect. (and seems likely to be slow and
possibly have a complex interaction with parallel and/or incremental
GC...) Maybe it should not be done on _every_ garbage collection.  First
to implement/try might be just detecting those duplicates so we have
some idea how much effect such an "optimization" would even have. (any
idea how easy/possible that is (or just what the answer is;)?)

Examples:

evaluated (543::Int) somewhere ends up pointing to the same (543# ::
Int#) in the heap as another (543::Int) somewhere does.

8:8:2:3:[]
9:9:2:3:[]
The []s become shared, then the "3:[]"s, then the "2:3:[]"s, ending
8:8:x
9:9:x
where x is 2:3:[]

8:8:2:3:(*)
9:9:2:3:(*)
where (*) is the *same* object, maybe an unevaluated thunk, maybe some
list.  After (*)'s creation, two different parts of code decided to
construct their own lists with that as their tail.  Eventually becomes:
8:8:x
9:9:x
where x is 2:3:(*)
Two thunks that merely have the same code shouldn't be combined because
it could be the [1..] [1..] sharing memory-leak to the extreme!  It
would seem safe to combine thunks that only produce something small like
an Int, but they can actually produce a complex Exception - so, no
combining of thunks. (also... must be careful in this area of breaking
the IO monad-sequencing(maybe?), but I think it's quite safe as long as
only already-evaluated heap objects are combined)

I wonder how this interacts with polymorphic (list-nil of different
types), or with merely structurally equivalent, data (if they're
identical, merging shouldn't ever be a problem!  at least as long as
no-one relies on something like "if (unsafe pointer equality) then (same
static type)"...)


Isaac
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGdmJaHgcxvIWYTTURAjAhAJ9WHwzxuJPuqouol05K7+QHkT3IigCglOpw
VtHCSxcy8E+w7OtCafU8pFM=
=BqAi
-----END PGP SIGNATURE-----


More information about the Glasgow-haskell-users mailing list