[GHC] #14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later

GHC ghc-devs at haskell.org
Wed Jan 31 19:57:46 UTC 2018


#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs
on Ubuntu 16.04 or later
-------------------------------------+-------------------------------------
        Reporter:  RyanGlScott       |                Owner:  alpmestan
            Type:  bug               |               Status:  new
        Priority:  highest           |            Milestone:  8.4.1
       Component:  GHC API           |              Version:  8.4.1-alpha1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  Runtime crash     |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #14603            |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by alpmestan):

 Below is a summary of how things go in `interpretBCO`.

 Here's how things look when we enter that function:

 {{{#!c
 (gdb) x/8a $rbp
 0x7fffffffe0f0: 0x7fffffffe140  0x2cc3186 <schedule+1186>
 0x7fffffffe100: 0x33e7d40       0x33c7ac0 <MainCapability>
 0x7fffffffe110: 0x33c7ac0 <MainCapability>      0x2cc73fd
 <createThread+465>
 0x7fffffffe120: 0x33c7ad8 <MainCapability+24>   0x2033c7ac0
 (gdb) x/8a Sp
 0x42001fcb28:   0x2cf4728 <stg_apply_interp_info>       0x42003b2e70
 0x42001fcb38:   0x2cf34e8 <stg_upd_frame_info>  0x42003b2eb0
 0x42001fcb48:   0x10202e0 <s15YO_info+304>      0x2cf34e8
 <stg_upd_frame_info>
 0x42001fcb58:   0x420012ec20    0x2cf34e8 <stg_upd_frame_info>
 (gdb) ghc closure 0x42003b2e70
 BCO
 (gdb) ghc closure 0x42003b2eb0
 AP(0x42003b2e70)
 }}}

 This leads us to entering this block from `interpretBCO`:

 {{{#!c
     else if (SpW(0) == (W_)&stg_apply_interp_info) {
         obj = UNTAG_CLOSURE((StgClosure *)SpW(1));
         Sp_addW(2);
         goto run_BCO_fun;
     }
 }}}

 We successfully retrieve `obj`, which is a `BCO`:

 {{{#!c
 (gdb)
 0x0000000002cc8c62 in interpretBCO (cap=0x33c7ac0 <MainCapability>) at
 rts/Interpreter.c:354
 354             obj = UNTAG_CLOSURE((StgClosure *)SpW(1));
 (gdb)
 355             Sp_addW(2);
 (gdb) print obj
 $5 = (StgClosure *) 0x42003b2e70
 (gdb) ghc closure obj
 BCO
 }}}

 We then jump to `run_BCO_fun` (see
 [https://github.com/ghc/ghc/blob/cca2d6b78f97bfb79bef4dc3f75d6c4d15b94680/rts/Interpreter.c#L936
 here]). The stack & heap checks go well and we jump straight to `run_BCO`
 (right below `run_BCO_fun` in the source code), which starts like this

 {{{#!c
         register int       bciPtr = 0; /* instruction pointer */
         register StgWord16 bci;
         register StgBCO*   bco        = (StgBCO*)obj;
         register StgWord16* instrs    =
 (StgWord16*)(bco->instrs->payload);
         register StgWord*  literals   =
 (StgWord*)(&bco->literals->payload[0]);
         register StgPtr*   ptrs       = (StgPtr*)(&bco->ptrs->payload[0]);
 #if defined(DEBUG)
         int bcoSize;
         bcoSize = bco->instrs->bytes / sizeof(StgWord16);
 }}}

 {{{#!c
 (gdb) print bcoSize
 $14 = 8
 (gdb) print (*instrs)@8
 $16 = {11, 0, 11, 1, 32, 11, 2, 58}
 }}}

 The `ptrs` and `literals` field among others seem to have been optimized
 out.

 And we then start processing the instructions. For reference, the opcodes
 for instructions are defined
 [https://github.com/ghc/ghc/blob/cca2d6b78f97bfb79bef4dc3f75d6c4d15b94680/includes/rts/Bytecodes.h#L26
 here].

 {{{#!c
     bci = BCO_NEXT;
     /* ... */
     switch (bci & 0xFF) {
 }}}

 {{{#!c
 (gdb) print bci
 $15 = 11
 }}}

 `11 & 0xFF = 11`, which is `bci_PUSH_G`. Here's how it's handled:

 {{{#!c
         case bci_PUSH_G: {
             int o1 = BCO_GET_LARGE_ARG;
             SpW(-1) = BCO_PTR(o1);
             Sp_subW(1);
             goto nextInsn;
         }

 /* definitions for the macros: */
 #define BCO_GET_LARGE_ARG ((bci & bci_FLAG_LARGE_ARGS) ?
 BCO_READ_NEXT_WORD : BCO_NEXT)
 #define bci_FLAG_LARGE_ARGS     0x8000
 #define BCO_PTR(n)    (W_)ptrs[n]
 }}}

 In our case, `bci = 11`, `11 & 0x8000 = 0` so we get `BCO_NEXT` which just
 reads the next instruction. In our case, `o1 = 0`,  so we get the first
 `StgMutArrPtrs` that comes with the BCO (`gdb` won't let me look at them,
 it says the value has been optimized out), we push the address on the
 stack and move to the next instruction, `11` (`bci_PUSH_G` again). This
 time, `o1 = 1`. We proceed as before but with the second `StgMutArrPtrs`.
 We end up with:

 {{{#!c
 (gdb) x/8a Sp
 0x42001fcb28:   0x407a0540      0x401618a8
 0x42001fcb38:   0x2cf34e8 <stg_upd_frame_info>  0x42003b2eb0
 0x42001fcb48:   0x10202e0 <s15YO_info+304>      0x2cf34e8
 <stg_upd_frame_info>
 0x42001fcb58:   0x420012ec20    0x2cf34e8 <stg_upd_frame_info>
 }}}

 We read the next instruction, `bci = 32`, which is, as you can see
 [https://github.com/ghc/ghc/blob/cca2d6b78f97bfb79bef4dc3f75d6c4d15b94680/includes/rts/Bytecodes.h#L57
 here]:

 {{{#!c
 #define bci_PUSH_APPLY_PP               32
 }}}

 This leads us to this bit of code in `interpretBCO`:

 {{{#!c
         case bci_PUSH_APPLY_PP:
             Sp_subW(1); SpW(0) = (W_)&stg_ap_pp_info;
             goto nextInsn;
 }}}

 After executing this code, we have, as expected:

 {{{#!c
 (gdb) x/8a Sp
 0x42001fcb20:   0x2cfa6a0 <stg_ap_pp_info>      0x407a0540
 0x42001fcb30:   0x401618a8      0x2cf34e8 <stg_upd_frame_info>
 0x42001fcb40:   0x42003b2eb0    0x10202e0 <s15YO_info+304>
 0x42001fcb50:   0x2cf34e8 <stg_upd_frame_info>  0x420012ec20
 }}}

 The next instruction is `11`, so `bci_PUSH_G` again. This time, `o1 = 2`
 and:

 {{{#!c
 (gdb) x/8a Sp
 0x42001fcb18:   0x407c6c20      0x2cfa6a0 <stg_ap_pp_info>
 0x42001fcb28:   0x407a0540      0x401618a8
 0x42001fcb38:   0x2cf34e8 <stg_upd_frame_info>  0x42003b2eb0
 0x42001fcb48:   0x10202e0 <s15YO_info+304>      0x2cf34e8
 <stg_upd_frame_info>
 }}}

 Next instruction, `bci = 58`,
 [https://github.com/ghc/ghc/blob/cca2d6b78f97bfb79bef4dc3f75d6c4d15b94680/includes/rts/Bytecodes.h#L83
 which is]:

 {{{#!c
 #define bci_ENTER                       58
 }}}

 and brings us to this code:

 {{{#!c
         case bci_ENTER:
             // Context-switch check.  We put it here to ensure that
             // the interpreter has done at least *some* work before
             // context switching: sometimes the scheduler can invoke
             // the interpreter with context_switch == 1, particularly
             // if the -C0 flag has been given on the cmd line.
             if (cap->r.rHpLim == NULL) {
                 Sp_subW(1); SpW(0) = (W_)&stg_enter_info;
                 RETURN_TO_SCHEDULER(ThreadInterpret, ThreadYielding);
             }
             goto eval;
 }}}

 In our case, `cap->r.rHpLim` is not null and we hit the `goto`, bringing
 us
 [https://github.com/ghc/ghc/blob/cca2d6b78f97bfb79bef4dc3f75d6c4d15b94680/rts/Interpreter.c#L370
 here].

 {{{#!c
 eval:
     tagged_obj = (StgClosure*)SpW(0); Sp_addW(1);
 eval_obj:
     obj = UNTAG_CLOSURE(tagged_obj);
     // ... to be continued ...
 }}}

 `gdb` (well, Ben's gdb plugin) reports this closure as off-heap.

 We then get to:

 {{{#!c
     switch ( get_itbl(obj)->type ) {
 }}}

 and in our case:

 {{{#!c
 (gdb) print (StgInfoTable)(*(obj->header.info - 1))
 $50 = {layout = {payload = {ptrs = 0, nptrs = 0}, bitmap = 0,
 large_bitmap_offset = 0, __pad_large_bitmap_offset = 0,
     selector_offset = 0}, type = 14, srt_bitmap = 0, code = 0x405268e8
 "I\203\304\030M;\245X\003"}
 }}}

 `type = 14` corresponds to:

 {{{#!c
 #define FUN_STATIC                    14
 }}}
 .
 Therefore no need to evaluate anything (the switch just `break`s for this
 case). We then move
 [https://github.com/ghc/ghc/blob/cca2d6b78f97bfb79bef4dc3f75d6c4d15b94680/rts/Interpreter.c#L534
 here].

 {{{#!c
     //
 ------------------------------------------------------------------------
     // We now have an evaluated object (tagged_obj).  The next thing to
     // do is return it to the stack frame on top of the stack.
 do_return:
     obj = UNTAG_CLOSURE(tagged_obj);
     ASSERT(closure_HNF(obj));
     /* ... */
     switch (get_itbl((StgClosure *)Sp)->type) {
 }}}

 In our case:

 {{{#!c
 (gdb) print (StgInfoTable)(*(c->header.info - 1))
 $49 = {layout = {payload = {ptrs = 2, nptrs = 0}, bitmap = 2,
 large_bitmap_offset = 2, __pad_large_bitmap_offset = 2,
     selector_offset = 2}, type = 30, srt_bitmap = 0, code = 0x2cfa6a0
 <stg_ap_pp_info> "\200<%Rw<\003"}
 }}}

 So `type = 30`, which is:

 {{{#!c
 #define RET_SMALL                     30
 }}}

 which makes us hit the `RET_SMALL` case.

 {{{#!c
 case RET_SMALL: {
         const StgInfoTable *info;

         // NOTE: not using get_itbl().
         info = ((StgClosure *)Sp)->header.info;

         if (info == (StgInfoTable *)&stg_restore_cccs_info ||
             info == (StgInfoTable *)&stg_restore_cccs_eval_info) {
             cap->r.rCCCS = (CostCentreStack*)SpW(1);
             Sp_addW(2);
             goto do_return;
         }

         if (info == (StgInfoTable *)&stg_ap_v_info) {
             n = 1; m = 0; goto do_apply;
         }
         if (info == (StgInfoTable *)&stg_ap_f_info) {
             n = 1; m = 1; goto do_apply;
         }
         if (info == (StgInfoTable *)&stg_ap_d_info) {
             n = 1; m = sizeofW(StgDouble); goto do_apply;
         }
         if (info == (StgInfoTable *)&stg_ap_l_info) {
             n = 1; m = sizeofW(StgInt64); goto do_apply;
         }
         if (info == (StgInfoTable *)&stg_ap_n_info) {
             n = 1; m = 1; goto do_apply;
         }
         if (info == (StgInfoTable *)&stg_ap_p_info) {
             n = 1; m = 1; goto do_apply;
         }
         if (info == (StgInfoTable *)&stg_ap_pp_info) {
             n = 2; m = 2; goto do_apply;
         }
         if (info == (StgInfoTable *)&stg_ap_ppp_info) {
             n = 3; m = 3; goto do_apply;
         }
         if (info == (StgInfoTable *)&stg_ap_pppp_info) {
             n = 4; m = 4; goto do_apply;
         }
         if (info == (StgInfoTable *)&stg_ap_ppppp_info) {
             n = 5; m = 5; goto do_apply;
         }
         if (info == (StgInfoTable *)&stg_ap_pppppp_info) {
             n = 6; m = 6; goto do_apply;
         }
         goto do_return_unrecognised;
     }
 }}}

 We take the `info == (StgInfoTable *)&stg_ap_pp_info` branch and head to
 `do_apply`
 [https://github.com/ghc/ghc/blob/cca2d6b78f97bfb79bef4dc3f75d6c4d15b94680/rts/Interpreter.c#L730
 here].

 There we just hit the default branch:

 {{{#!c
         default:
         defer_apply_to_sched:
             IF_DEBUG(interpreter,
                      debugBelch("Cannot apply compiled function; yielding
 to scheduler\n"));
             Sp_subW(2);
             SpW(1) = (W_)tagged_obj;
             SpW(0) = (W_)&stg_enter_info;
             RETURN_TO_SCHEDULER_NO_PAUSE(ThreadRunGHC, ThreadYielding);

 /* where: */

 #define SAVE_STACK_POINTERS                     \
     cap->r.rCurrentTSO->stackobj->sp = Sp;

 #define SAVE_THREAD_STATE()                     \
     SAVE_STACK_POINTERS
 #endif

 #define RETURN_TO_SCHEDULER_NO_PAUSE(todo,retcode)      \
    SAVE_THREAD_STATE();                                 \
    cap->r.rCurrentTSO->what_next = (todo);              \
    cap->r.rRet = (retcode);                             \
    return cap;
 }}}

 FWIW, right before we return to the scheduler, the stack looks like:

 {{{#!c
 (gdb) x/8a Sp
 0x42001fcb10:   0x2cf5900 <stg_enter_info>      0x407c6c20
 0x42001fcb20:   0x2cfa6a0 <stg_ap_pp_info>      0x407a0540
 0x42001fcb30:   0x401618a8      0x2cf34e8 <stg_upd_frame_info>
 0x42001fcb40:   0x42003b2eb0    0x10202e0 <s15YO_info+304>
 }}}

 But then, when I continue executing the program instruction by
 instruction, gdb shows me

 {{{#!c
     default:
         barf("interpretBCO: unknown or unimplemented opcode %d",
              (int)(bci & 0xFF));
 }}}

 which is the default case for the switch on the opcode we read from a BCO.
 Not sure what's going on here, because there's just no way the control
 flow leads us there. So I quickly tried another run, but setting a
 breakpoint on `barf`, and we just never call it. This suggests that
 `interpretBCO` in fact goes well and that it does set things up properly
 on the stack and saves the stack in the capability's state. And `gdb` just
 points me to `default` for lack of a better line, when it is executing the
 function, I suppose?

 Next I'll look at what goes on when `convertAnnotationWrapper` picks this
 up. By quickly skimming through it in `gdb` I see some allocations, some
 `Data.Data` related symbol (comes from `AnnotationWrapper`'s constructor
 IIRC), some pinned byte array related ones as well. Hopefully this step
 will reveal the actual problem...

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14675#comment:16>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list