GHC 6.4.3 is stalled
gwright at comcast.net
Fri Jul 28 05:32:50 EDT 2006
On Jul 28, 2006, at 3:58 AM, Simon Marlow wrote:
> Hi Greg,
> Gregory Wright wrote:
>> Some data and a few questions:
>> 1. The failure on FreeBSD is not the same as on OS X. I built 6.4.2
>> from cvs on FreeBSD 6.1, and ran the ghc-regress tests. The tests
>> took a long time to run (about 14 hours on a dual Xeon 2.8 GHz
>> with 2 GB of memory). Towards the end of the tests, there were
>> about 30 "timeout" processes running, apparently doing nothing
>> but consuming cpu cycles.
> Ok, this is certainly a problem with forkOS in the threaded RTS in
> 6.4.2 on FreeBSD. I probably need to get access to a FreeBSD box
> to fix this myself, the code is pretty delicate (and sadly it has
> completely changed in 6.6, too).
> It might be worth trying with -lthr instead of -lpthread, according
> to Robert Watson. This switches to an alternative, 1:1, threading
I can try this. If you need access to a FreeBSD 6.1 box (dual 2.8
GHz Xeon, 2 G RAM),
I can set up ssh access for you. Let me know.
>> 2. Notes on reproducing the FreeBSD 6.4.2 build: I used
>> fpconfig from the ghc-6-4 branch;
>> ghc, libraries, hslibs and testsuite from the ghc-6-4-2 branch;
>> gnu make 3.80;
>> autoconf 2.59.
>> Gnu make 3.81 went into an infinite loop, much as gnu make 3.79
>> did when building ghc on OS X.
> That's odd, the fix for make 3.79 is in the 6.4.2 tree (rev.
> 220.127.116.11 of mk/suffix.mk). Something else must be happening with
> 3.81, sigh.
Yes, seems to be one of those things. I'm not going to look at it,
3.80 seems to work well enough at the moment.
>> 3. Did the threaded RTS work on 6.4.1? Was it used by default?
> Presumably not. In 6.4.2 we switched to using the threaded RTS by
> default for GHC itself, which has forced the problem to the
> surface. Also there were some changes to the timeout program in
> the testsuite, which have apparently forced some other problems to
> the surface.
>> I can provide an RTS thread listing (+RTS -Ds) if that would be a
>> point. Someone would have to explain what it means to me, though.
>> 4. When running with debugging turned on, I have seen the
>> assertion failure
>> ghc-6.4.2: internal error: ASSERTION FAILED: file GC.c, line 4356
>> Please report this as a compiler bug. See:
>> This points toward the stack being corrupted. Maybe a thread
>> its stack? I'm not sure. The assertion that fails is
>> ASSERT(frame < bottom);
>> It looks as if something has messed up the stack before this.
> Ok, it would help to find a smaller program that crashes with -
> threaded: debugging GHC itself is quite hard because it's difficult
> to get a deterministic run and hence reproducibility. Look at
> your testsuite failures and find threaded failures that aren't due
> to the compiler crashing (or just build stage2 without -threaded
> and run the testsuite again). Tests in concurrent/ are a good bet.
> When we have a smallish program that crashes, we can start debugging.
I will do a build and look at the failing tests to isolate a simple
Here's another data point: Joel Reymont said that his OS X/intel
builds do not
crash during the testsuite (nothing in the CrashReporter logs). But
that he saw the accumulation of "timeout" processes. Earlier this
acquired a MacBook and have just finished loading ghc onto it. I
to reproduce his result.
That information, if true, is a bit discouraging. It seems to say
that the problem
on intel may be different from that on ppc. In particular, the
may only be happening on ppc. Yuck. I will verify whether this is so.
>> I am willing to dig into this, but I need a bit more help with
>> where to start.
> Thanks for your help!
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
More information about the Glasgow-haskell-users