[Haskell-cafe] How do I debug this RTS segfault?

Lana Black lanablack at amok.cc
Mon Jul 25 00:46:54 UTC 2016


On 21:25 Sun 24 Jul     , Anatoly Yakovenko wrote:
> It's probably out of file descriptors. It's possible that it tries to open
> another one during the error handling.
> On Sun, Jul 24, 2016 at 10:50 AM Lana Black <lanablack at amok.cc> wrote:
> 
> > Hello,
> >
> > I have run into this RTS bug recently. In short, when executing multiple
> > consequtive forks, after 500-600 or so the process is terminated by
> > SIGSEGV. I know this kind of thing is totally artificial, but still.
> >
> > The problem I have is that I can't get any meaningful backtrace in gdb.
> > For example, for threaded RTS I get this
> >
> > (gdb) bt
> > #0  0x0000000000560d63 in
> > base_GHCziEventziThread_ensureIOManagerIsRunning1_info ()
> > Backtrace stopped: Cannot access memory at address 0x7fffff7fcea0
> >
> > For non-threaded RTS I get this
> >
> > (gdb) bt
> > #0  0x00000000007138c9 in stg_makeStablePtrzh ()
> > Backtrace stopped: Cannot access memory at address 0x7fffff7fc720
> >
> > Build command: ghc --make -O2 -g -fforce-recomp fork.hs
> > Add threaded if needed.
> >
> > I was able to reproduce this bug with both GHC 7.10.3 and todays HEAD
> > with the code below.
> >
> > >import System.Exit (exitSuccess)
> > >import System.Posix.Process (forkProcess)
> > >
> > >fork_ n | n > 0 = processPid =<< forkProcess (fork_ $! n - 1)
> > >        | otherwise = putStrLn "I'm done!"
> > >
> > >processPid pid | pid  > 0 = exitSuccess
> > >               | pid  < 0 = putStrLn "OOOPS, forkProcess failed!"
> > >               | otherwise = pure ()
> > >
> > >main = fork_ 1000
> > >
> >
> > With best regards.
> > _______________________________________________
> > Haskell-Cafe mailing list
> > To (un)subscribe, modify options or view archives go to:
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> > Only members subscribed via the mailman list are allowed to post.

Seems like this is not the case. I actually overlooked GHCs -debug
option, with it I'm now able to get a stacktrace. Furthermore, the
number of used file descriptors is well within the limit, and changing
the latter with `ulimit -n` does not affect the outcome.

Curiously, the stacks are rather different for threaded and non-threaded
RTS.

Non-threaded:
(gdb) bt
#0  INFO_PTR_TO_STRUCT (info=<error reading variable: Cannot access
memory at address 0x7fffff7feff0>) at
includes/rts/storage/ClosureMacros.h:60
#1  0x000000000070e956 in get_itbl (c=0x20006e7f8) at
includes/rts/storage/ClosureMacros.h:87
#2  0x000000000070ec3c in closure_sizeW (p=0x20006e7f8) at
includes/rts/storage/ClosureMacros.h:439
#3  0x000000000070ecf7 in overwritingClosure (p=0x20006e7f8) at
includes/rts/storage/ClosureMacros.h:555
#4  0x0000000000725dd7 in stg_upd_frame_info ()
#5  0x0000000000000000 in ?? ()

Threaded:
(gdb) bt
#0  0x00007ffff6ce49ce in _IO_vfprintf_internal (s=s at entry=0x7fffff7ff430, format=format at entry=0x7ffff75c3550 "/proc/self/task/%u/comm", ap=ap at entry=0x7fffff7ff558)
    at vfprintf.c:1266
#1  0x00007ffff6d0954b in __IO_vsprintf (string=0x7fffff7ff630 "`\366\177\377\377\177", format=0x7ffff75c3550 "/proc/self/task/%u/comm", args=args at entry=0x7fffff7ff558)
    at iovsprintf.c:42
#2  0x00007ffff6cecd47 in __sprintf (s=s at entry=0x7fffff7ff630 "`\366\177\377\377\177", format=format at entry=0x7ffff75c3550 "/proc/self/task/%u/comm") at sprintf.c:32
#3  0x00007ffff75c1f2b in pthread_setname_np (th=140737317025536, name=0x78ba04 "ghc_ticker") at ../sysdeps/unix/sysv/linux/pthread_setname.c:49
#4  0x000000000072ce4e in initTicker (interval=10000000, handle_tick=0x71a23d <handle_tick>) at rts/posix/itimer/Pthread.c:173
#5  0x000000000071a32f in initTimer () at rts/Timer.c:111
#6  0x0000000000703c26 in forkProcess (entry=0x207) at rts/Schedule.c:2072
#7  0x0000000000405bf7 in s7dF_info ()
#8  0x0000000000000000 in ?? ()



More information about the Haskell-Cafe mailing list