GHC 6.4.3 is stalled

Simon Marlow simonmarhaskell at gmail.com
Fri Jul 28 03:58:47 EDT 2006


Hi Greg,

Gregory Wright wrote:

> Some data and a few questions:
> 
> 1. The failure on FreeBSD is not the same as on OS X.  I built 6.4.2
> from cvs on FreeBSD 6.1, and ran the ghc-regress tests. The tests
> took a long time to run (about 14 hours on a dual Xeon 2.8 GHz
> with 2 GB of memory). Towards the end of the tests, there were
> about 30 "timeout" processes running, apparently doing nothing
> but consuming cpu cycles.

Ok, this is certainly a problem with forkOS in the threaded RTS in 6.4.2 on 
FreeBSD.  I probably need to get access to a FreeBSD box to fix this myself, the 
code is pretty delicate (and sadly it has completely changed in 6.6, too).

It might be worth trying with -lthr instead of -lpthread, according to Robert 
Watson.  This switches to an alternative, 1:1, threading library.


> 2. Notes on reproducing the FreeBSD 6.4.2 build:  I used
> 
>     fpconfig from the ghc-6-4 branch;
>     ghc, libraries, hslibs and testsuite from the ghc-6-4-2 branch;
>     gnu make 3.80;
>     autoconf 2.59.
> 
> Gnu make 3.81 went into an infinite loop, much as gnu make 3.79
> did when building ghc on OS X.

That's odd, the fix for make 3.79 is in the 6.4.2 tree (rev. 1.82.2.2 of 
mk/suffix.mk).  Something else must be happening with 3.81, sigh.

> 3. Did the threaded RTS work on 6.4.1?  Was it used by default?

Presumably not.  In 6.4.2 we switched to using the threaded RTS by default for 
GHC itself, which has forced the problem to the surface.  Also there were some 
changes to the timeout program in the testsuite, which have apparently forced 
some other problems to the surface.

> I can provide an RTS thread listing (+RTS -Ds) if that would be a  starting
> point.  Someone would have to explain what it means to me, though.
> 
> 4. When running with debugging turned on, I have seen the assertion  
> failure
> 
> ghc-6.4.2: internal error: ASSERTION FAILED: file GC.c, line 4356
>     Please report this as a compiler bug.  See:
>     http://www.haskell.org/ghc/reportabug
> 
> This points toward the stack being corrupted.  Maybe a thread  overflowing
> its stack?  I'm not sure.  The assertion that fails is
> 
>     ASSERT(frame < bottom);
> 
> It looks as if something has messed up the stack before this.

Ok, it would help to find a smaller program that crashes with -threaded: 
debugging GHC itself is quite hard because it's difficult to get a deterministic 
run and hence reproducibility.  Look at  your testsuite failures and find 
threaded failures that aren't due to the compiler crashing (or just build stage2 
without -threaded and run the testsuite again).  Tests in concurrent/ are a good 
bet.

When we have a smallish program that crashes, we can start debugging.

> I am willing to dig into this, but I need a bit more help with where  to 
> start.

Thanks for your help!

Cheers,
	Simon


More information about the Glasgow-haskell-users mailing list