FW: Observation re Hugs GC problem on sparcs w/gcc -O2
Julian Seward (Intl Vendor)
v-julsew@microsoft.com
Tue, 1 May 2001 04:55:08 -0700
Looking back through old mail. I mailed this to the Hugs maintainers
about 2 years ago, but I think it is still relevant. It contains a
suggestion about why GC doesn't always work right on Hugs on sparcs
when type.c/static.c/whatever are compiled -O/-O2, and an easy fix.
J
| -----Original Message-----
| From: Julian Seward (Intl Vendor) [mailto:v-julsew@microsoft.com]=20
| Sent: Thursday, April 22, 1999 10:13 AM
| To: 'hugs-bugs@cs.yale.edu'
| Cc: 'D.Wakeling@exeter.ac.uk'
| Subject: Observation re Hugs GC problem on sparcs w/gcc -O2
|=20
|=20
|=20
| Mark
|=20
| I seem to gather that (normal) Hugs has GC problems on Sparc=20
| with optimisation on. David W also reported this recently.
|=20
| I got the impression that you think this happens because the=20
| register window mechanism magically obscures some registers=20
| which could hold roots, at the point when GC occurs.
|=20
| I would like to offer a different explanation, plus an easy=20
| fix. When I was working for the UFO project at Manchester,=20
| we did a collector which took roots from the C stack. It too=20
| had problems on sparcs, and the following observation saved=20
| the day. I think it might equally apply to Hugs.
|=20
| In the code below, complexFnReturningAPair is obvious,
| and ... compute_and_alloc ... refers to some code which
| may cause a garbage collection.
|=20
|=20
| Pair complexFnReturningAPair ( ... ) { .... }
|=20
|=20
| Pair p;
|=20
| p =3D complexFnReturningAPair ( ... );
| for ( ... ) {
| ... fst(p) ... snd(p) ...
| ... compute_and_alloc ...
| }
|=20
| So: p becomes a pair. We then enter a loop which refers to
| fst(p) and/or snd(p), and in which GC could happen. After cppisation,=20
| the code looks like
|=20
| p =3D complexFnReturningAPair ( ... );
| for ( ... ) {
| ... heapTopFst[p] ... heapTopSnd[p] ...
| ... compute_and_alloc ...
| }
|=20
| Expanding the array address calculations gives
|=20
| p =3D complexFnReturningAPair ( ... );
| for ( ... ) {
| ... heapTopFst + 4*p ... heapTopSnd + 4*p ...
| ... compute_and_alloc ...
| }
|=20
| Now the critical step, done by gcc when optimising: lift
| 4*p out of the loop since it's loop-invariant, and
| (incidentally) CSE the two 4*p's together:
|=20
| tmp =3D 4 * complexFnReturningAPair ( ... );
| for ( ... ) {
| ... heapTopFst + tmp ... heapTopSnd + tmp ...
| ... compute_and_alloc ...
| }
|=20
| Blargh! Now, inside the loop, there is no mention of the=20
| original p returned by complexFnReturningAPair, neither in=20
| regs nor on the stack. So the pair won't be retained by GC=20
| even though it's still live in the loop.
|=20
| The moral of the story is this: when using an=20
| array-index-addressed heap, as Hugs does, we need to treat as=20
| a root not only valid array indexes into the heap, but also=20
| intermediates created by the array address calculations. That is:
|=20
| p sat -MAX_HEAP < p <=3D 0
| p*sizeof(int) sat -MAX_HEAP < p <=3D 0
| heapTopFst+ p*sizeof(int) sat -MAX_HEAP < p <=3D 0
| heapTopSnd+ p*sizeof(int) sat -MAX_HEAP < p <=3D 0
|=20
| Looking in machdep.c:gcCStack(), I only see the first of=20
| those four possibilities handled. Perhaps the following=20
| would be better:
|=20
| #define Blargh markWithoutMove(*ptr); \
| markWithoutMove((*ptr)/sizeof(Cell)); \
| markWithoutMove((=20
| (void*)(*ptr)-(void*)heapTopFst)/sizeof(Cell)); \
| markWithoutMove((
| (void*)(*ptr)-(void*)heapTopSnd)/sizeof(Cell))
|=20
| #define StackGrowsDown { while (ptr<=3DCStackBase) { Blargh;=20
| ptr++; }; }
| #define StackGrowsUp { while (ptr>=3DCStackBase) { Blargh;=20
| ptr--; }; }
| #define GuessDirection if (ptr>CStackBase) StackGrowsUp else=20
| StackGrowsDown
|=20
| I tried this on x86 and it made no difference at all, since=20
| there isn't a problem for x86 anyway. I'd be interested to=20
| hear if it makes any difference on Sparcs. (note -- the code=20
| might not be exactly right). Check if you try it ...
|=20
| Finally, I think I know why the problem doesn't strike x86s. =20
| That is because the x86 has addressing modes of the form
|=20
| (reg1 + k*reg2) for k as 1, 2, 4 or 8
|=20
| So a reference to heapTopFst[p] becomes=20
|=20
| ; reg1 holds p
| ; reg2 holds heapTopFst
| movl (reg1 + 4*reg2), reg3
|=20
| and there is never any point precomputing 4*p, since the=20
| architecture can do it for free.
|=20
| The entire diatribe applies not only to sparcs but other=20
| riscs which have lots of registers but only simple addressing=20
| modes. Have you had problem reports for (eg) MIPS too?
|=20
| J
|=20