[Hat] hat 2.00 make failure

Byron Hale hat@haskell.org
Sun, 16 Jun 2002 16:47:00 -0700


At 12:27 PM 6/16/2002 +0100, you wrote:
>Byron Hale <byron.hale@einfo.com> writes:
>
> > The problem seems to be a bug in gcc 2.96:
> > gcc: Internal Error: segmentation fault (program as)
>
>Curious, because I have almost exactly the same environment as you:
>     RedHat 7.2, i686, ghc-5.02.2, hmake-3.05, gcc-2.96
>and it compiles just fine for me.

I actually found an upgrade of gcc 2.96-98 to gcc 2.96-110 on
Redhat's site and upgraded to it with the appropriate packages
also installed to assure system stability. It upgraded without
complaint. However, the problem persists and that is the only
rpm upgrade for gcc 2.96-98.

>However, I have seen segmentation faults in gcc which are due to
>hardware errors.  When I have watched this happen before, usually gcc
>gives a seg fault, then if you restart the `make', it successfully
>completes the task that failed before, then continues a bit further
>and seg faults again.  You can do that a few times, but eventually
>the whole system has a hard crash.
>
>It often means either bad RAM, or your CPU is overheating.  Can you
>check whether the processor cooling fans are operating correctly?
>Older ball-bearing fans sometimes get sticky or clog up with dust
>and no longer turn.  Also, I mentioned ACPI earlier, because that is
>a form of hardware/BIOS control where often the CPU fans are only
>switched on in response to a thermal sensor.  The Linux 2.4 series
>kernels do not implement the thermal control, so it is very easy to
>overheat the processor just by max-ing it out for 15-20 minutes.

Well, I've opened the case. The fans run all the time and the CPUs are
just above room temperature. The RAM is lifetime warranty PC133 ECC
running on a 100MHz system bus, which should give some degree of
fault tolerance, but maybe doesn't.

However, the failure always happens at the same place. A former,
somewhat grizzled EE boss once pointed out that hardware errors tend to
be random, but that software errors tend to be repeatable. So, I'm leaning
toward the idea that a C/C++ pointer is being de referenced incorrectly.

Best Regards,

Byron