[GHC] #7993: ghc 7.6 (not 7.4) sometimes hangs at child process exit on s390x

GHC ghc-devs at haskell.org
Mon Jun 17 23:32:42 CEST 2013


#7993: ghc 7.6 (not 7.4) sometimes hangs at child process exit on s390x
---------------------+------------------------------------------------------
Reporter:  cjwatson  |          Owner:                
    Type:  bug       |         Status:  new           
Priority:  normal    |      Component:  Runtime System
 Version:  7.6.3     |       Keywords:                
      Os:  Linux     |   Architecture:  Other         
 Failure:  Other     |      Blockedby:                
Blocking:            |        Related:                
---------------------+------------------------------------------------------
 On Debian's s390x architecture (64-bit S/390, Linux kernel), builds of
 several packages hang with GHC 7.6 where they did not hang with GHC 7.4.
 In particular, ghc itself hangs during its own build when bootstrapping
 with 7.6.  This is quite easy to reproduce on affected systems, although
 it doesn't hang in exactly the same place every time.  It appears that the
 runtime sometimes deadlocks when a subprocess exits; the strace looks like
 this:

 {{{
 7523  exit_group(0)                     = ?
 6680  <... futex resumed> )             = ? ERESTARTSYS (To be restarted)
 6680  --- SIGCHLD (Child exited) @ 0 (0) ---
 6680  futex(0x84fa86ac, FUTEX_WAIT_PRIVATE, 1143, NULL) = ? ERESTARTSYS
 (To be restarted)
 6680  --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
 6680  sigreturn()                       = ? (mask now [])
 6680  futex(0x84fa86ac, FUTEX_WAIT_PRIVATE, 1143, NULL) = ? ERESTARTSYS
 (To be restarted)
 6680  --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
 6680  sigreturn()                       = ? (mask now [])
 6680  futex(0x84fa86ac, FUTEX_WAIT_PRIVATE, 1143, NULL) = ? ERESTARTSYS
 (To be restarted)
 [repeats forever]
 }}}

 ghc spawns enough subprocesses (gcc etc.) that it's essentially bound to
 hit this sooner or later.  I suspect perhaps a lack of signal-safety
 somewhere - at an extremely wild guess, perhaps the type of an important
 variable written in a signal handler happens to exceed the size of
 sig_atomic_t on s390x and not elsewhere - but I haven't yet been able to
 track this down in the time available to me.

 If you don't immediately recognise this as something obvious, then perhaps
 somebody more fluent in Haskell than I would be good enough to suggest
 test code that exercises this and is somewhat simpler than "build ghc"?
 If my analysis is at all close to the mark, then something that sits in a
 loop forking and reaping a trivial child process on each iteration should
 be enough to reproduce this.  On the assumption that most non-Debian-
 developers don't have convenient access to S/390 machines (Debian
 developers can use zelenka.debian.org), I'd be happy to try things out.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/7993>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler



More information about the ghc-tickets mailing list