Strange GHC/STM behaviour
marlowsd at gmail.com
Mon Mar 15 07:58:59 EDT 2010
On 15/03/2010 08:59, Michael Lesniak wrote:
> In one of my example programs I have a strange behaviour: it is a very
> simple taskpool using STM; in pseudocode it's
> 1. generate data structures
> 2. initialize data structures
> 3. fork threads
> 4. wait (using STM) until the pool is empty and all threads are finished
> 5. print a final message
> In very few cases, which depend on the number of threads spawned, the
> program hangs *after* the final message of step 5 has been printed.
> "Few cases" means, for example, 50.000 good, terminating runs before
> it hangs. If you increment the number of spawned threads (to a few
> hundred or thousands), it hangs much faster. Since forked threads
> terminate after the main thread terminates (which it should after
> printing the message), this behaviour is quite unexpected.
I've fixed three deadlocks since 6.12.1 was released: two were IO
manager-related, and one caused by an interaction between the scheduler
and GC. It's likely that one of these is your problem. All of them are
fixed in 6.12.2, so if you are able to grab a snapshot and test it that
would be very helpful.
Tue Mar 9 09:58:31 GMT 2010 Simon Marlow <marlowsd at gmail.com>
* Fix a rare deadlock when the IO manager thread is slow to start up
This fixes occasional failures of ffi002(threaded1) on a loaded
M ./rts/Capability.c -1 +9
Tue Jan 26 15:00:37 GMT 2010 Simon Marlow <marlowsd at gmail.com>
* Fix a deadlock, and possibly other problems
After a bound thread had completed, its TSO remains in the heap until
it has been GC'd, although the associated Task is returned to the
caller where it is freed and possibly re-used.
The bug was that GC was following the pointer to the Task and updating
the TSO field, meanwhile the Task had already been recycled (it was
being used by exitScheduler()). Confusion ensued, leading to a very
occasional deadlock at shutdown, but in principle it could result in
other crashes too.
The fix is to remove the link between the TSO and the Task when the
TSO has completed and the call to schedule() has returned; see
comments in Schedule.c.
M ./rts/Schedule.c -3 +18
Thu Feb 25 12:02:55 GMT 2010 Simon Marlow <marlowsd at gmail.com>
* Plug two race conditions that could lead to deadlocks in the IO manager
M ./GHC/Conc.lhs -6 +16
> Since I've experienced strange behaviour in the past which was the
> fault of my system configuration, I am a bit cautious before
> reporting a bug on GHC's bugtracker, especially since its reproduction
> is so difficult and random.
I've been doing a lot of testing recently that involves running a
program repeatedly in a loop until it goes wrong, such is the nature of
non-deterministic concurrency :-)
> So my question is how much circumspection is expected/needed before
> one should enter a bug in the bug tracker? I've tested the attached
> code on three different systems (with different linux systems, but
> always GHC 6.12.1 (since it's a bit costly to install the older
> versions)) and observed the mentioned behaviour. Is this enough to
> justify a bug report?
Sure, by all means submit a bug report. As mentioned earlier, you might
be able to avoid doing so if you find that the 6.12.2 snapshot fixes it,
More information about the Glasgow-haskell-users