[GHC] #15544: Non-deterministic segmentation fault in cryptohash-sha256 testsuite

GHC ghc-devs at haskell.org
Tue Sep 11 09:06:01 UTC 2018


#15544: Non-deterministic segmentation fault in cryptohash-sha256 testsuite
-------------------------------------+-------------------------------------
        Reporter:  bgamari           |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  highest           |            Milestone:  8.6.1
       Component:  Compiler          |              Version:  8.4.3
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by osa1):

 I managed to reproduce it and did some debugging.

 Here's the problem. We have this object:

 {{{
 >>> print *((StgClosure *) 0xe4c558)
 $21 = {
   header = {
     info = 0x409968 <reFi_info>
   },
   payload = 0xe4c560
 }
 }}}

 It's defined like this:

 {{{
 $wxs_reFi
   :: GHC.Prim.Int#
      -> (# Data.ByteString.Internal.ByteString,
            [Data.ByteString.Internal.ByteString] #)
 [GblId, Arity=1, Str=<S,1*U>, Unf=OtherCon []] =
     sat-only [] \r [ww_seSe]
         case ww_seSe of ds1_seSf [Occ=Once] {
           __DEFAULT ->
               let {
                 sat_seSk [Occ=Once] ::
 [Data.ByteString.Internal.ByteString]
                 [LclId] =
                     [ds1_seSf] \u []
                         case -# [ds1_seSf 1#] of sat_seSg [Occ=Once] {
                           __DEFAULT ->
                               case $wxs_reFi sat_seSg of {
                                 (#,#) ww2_seSi [Occ=Once] ww3_seSj
 [Occ=Once] ->
                                     : [ww2_seSi ww3_seSj];
                               };
                         };
               } in  (#,#) [x_reFh sat_seSk];
           1# -> (#,#) [x_reFh GHC.Types.[]];
         };
 }}}

 Notice that (1) it's a FUN_STATIC (2) it has references to another static
 object x_reFh:

 {{{
 x_reFh :: Data.ByteString.Internal.ByteString
 [GblId] =
     [] \u []
         case
             newMutVar# [GHC.ForeignPtr.NoFinalizers GHC.Prim.realWorld#]
         of
         { (#,#) ipv_seS6 [Occ=Once] ipv1_seS7 [Occ=Once] ->
               case __pkg_ccall bytestring-0.10.8.2 [addr#1_reFg ipv_seS6]
 of {
                 (#,#) _ [Occ=Dead] ds2_seSb [Occ=Once] ->
                     case word2Int# [ds2_seSb] of sat_seSd [Occ=Once] {
                       __DEFAULT ->
                           let {
                             sat_seSc [Occ=Once] ::
 GHC.ForeignPtr.ForeignPtrContents
                             [LclId] =
                                 CCCS GHC.ForeignPtr.PlainForeignPtr!
 [ipv1_seS7];
                           } in
                             Data.ByteString.Internal.PS [addr#1_reFg
 sat_seSc 0# sat_seSd];
                     };
               };
         };
 }}}

 The FUN_STATIC SRT optimization should apply to this object. So instead of
 a SRT table we should have the SRT entries in its payload. However n_ptrs
 of this object is 0:

 {{{
 >>> set $itbl = itbl_to_fun_itbl(get_itbl((StgClosure *) 0xe4c558))
 >>> print *$itbl
 $21 = {
   f = {
     slow_apply_offset = 59278791,
     __pad_slow_apply_offset = 1572864,
     b = {
       bitmap = 10376465356425854976,
       bitmap_offset = -907476992,
       __pad_bitmap_offset = 3387490304
     },
     fun_type = 4,
     arity = 1
   },
   i = {
     layout = {
       payload = {
         ptrs = 0,
         nptrs = 0
       },
       bitmap = 0,
       large_bitmap_offset = 0,
       __pad_large_bitmap_offset = 0,
       selector_offset = 0
     },
     type = 14,
     srt = 10759120,
     code = 0x409968 <reFi_info> "I\203\304\030M;\245X\003"
   }
 }
 }}}

 So it seems like for some reason we don't actually do FUN_STATIC SRT
 optimization for this objects. Indeed I can get the reference to refH in
 the srt field:

 {{{
 >>> print *((StgClosure*) (((StgWord) (($itbl)+1)) + ($itbl)->i.srt)) <---
 GET_FUN_SRT
 $10 = {
   header = {
     info = 0x4097e8 <reFh_info>
   },
   payload = 0xe4c540
 }

 >>> print ((StgClosure*) (((StgWord) (($itbl)+1)) + ($itbl)->i.srt))
 $11 = (StgClosure *) 0xe4c538
 }}}

 x_reFh is originally a THUNK and becomes IND_STATIC after evaluation:

 {{{
 >>> call printClosure((StgClosure *) 0xe4c538)
 THUNK(0x4097e8)

 >>> c
 Hardware watchpoint 5: ((StgClosure *) 0xe4c538)->header.info

 Old value = (const StgInfoTable *) 0x4097e8 <reFh_info>
 New value = (const StgInfoTable *) 0xdce688 <stg_IND_STATIC_info>
 SET_INFO (c=0xe4c538, info=0xdce688 <stg_IND_STATIC_info>) at
 includes/rts/storage/ClosureMacros.h:50
 50      }
 >>> bt
 #0  SET_INFO (c=0xe4c538, info=0xdce688 <stg_IND_STATIC_info>) at
 includes/rts/storage/ClosureMacros.h:50
 #1  0x0000000000dbac9b in lockCAF (reg=0x1020818 <MainCapability+24>,
 caf=0xe4c538) at rts/sm/Storage.c:415
 #2  0x0000000000dbacc5 in newCAF (reg=0x1020818 <MainCapability+24>,
 caf=0xe4c538) at rts/sm/Storage.c:425
 #3  0x0000000000409809 in reFh_info ()
 #4  0x0000000000000000 in ?? ()

 >>> call printClosure((StgClosure *) 0xe4c538)
 IND_STATIC(0x42004d5878)
 }}}

 Now as long as reFi is reachable this 0xe4c538 should be reachable because
 it's in SRT of reFi. Let's continue:

 {{{
 >>> c
 ... assertion failure ...
 >>> bt
 #0  0x0000000000db8800 in LOOKS_LIKE_INFO_PTR_NOT_NULL
 (p=12297829382473034410) at includes/rts/storage/ClosureMacros.h:260
 #1  0x0000000000db884f in LOOKS_LIKE_INFO_PTR (p=12297829382473034410) at
 includes/rts/storage/ClosureMacros.h:265
 #2  0x0000000000db8887 in LOOKS_LIKE_CLOSURE_PTR (p=0x4200122a10) at
 includes/rts/storage/ClosureMacros.h:270
 #3  0x0000000000db9240 in evacuate (p=0xe4c540) at rts/sm/Evac.c:516
 #4  0x0000000000ddf87e in scavenge_static () at rts/sm/Scav.c:1690
 #5  0x0000000000ddff0a in scavenge_loop () at rts/sm/Scav.c:2085
 #6  0x0000000000db4c49 in scavenge_until_all_done () at rts/sm/GC.c:1088
 #7  0x0000000000db38ba in GarbageCollect (collect_gen=1,
 do_heap_census=false, gc_type=0, cap=0x1020800 <MainCapability>,
 idle_cap=0x0) at rts/sm/GC.c:416
 #8  0x0000000000d995a7 in scheduleDoGC (pcap=0x7fff635d6780,
 task=0x2802f60, force_major=false) at rts/Schedule.c:1799
 #9  0x0000000000d98a7f in schedule (initialCapability=0x1020800
 <MainCapability>, task=0x2802f60) at rts/Schedule.c:545
 #10 0x0000000000d99f79 in scheduleWaitThread (tso=0x4200105388, ret=0x0,
 pcap=0x7fff635d6880) at rts/Schedule.c:2533
 #11 0x0000000000da8b4c in rts_evalLazyIO (cap=0x7fff635d6880, p=0xe4d928,
 ret=0x0) at rts/RtsAPI.c:530
 #12 0x0000000000da9297 in hs_main (argc=7, argv=0x7fff635d6a78,
 main_closure=0xe4d928, rts_config=...) at rts/RtsMain.c:72
 #13 0x000000000041210c in main ()
 }}}

 0xe4c540 is indirectee of 0xe4c538:

 {{{
 >>> print &((StgInd*)0xe4c538)->indirectee
 $27 = (StgClosure **) 0xe4c540
 }}}

 But the object was cleared (because this is in sanity mode)

 {{{
 >>> print *UNTAG_CLOSURE(((StgInd*)0xe4c538)->indirectee)
 $29 = {
   header = {
     info = 0xaaaaaaaaaaaaaaaa
   },
   payload = 0x4200122a18
 }
 }}}

 so it became unreachable. For this object to be unreachable reFi should be
 unreachable too. Let's see if it was reachable in this GC:

 {{{
 >>> break GarbageCollect
 Breakpoint 6 at 0xdb3492: file rts/sm/GC.c, line 226.
 >>> break evacuate_static_object if q == 0xe4c558
 Breakpoint 7 at 0xdb8f85: file rts/sm/Evac.c, line 333.
 >>> reverse-continue
 }}}

 Breakpoint 7 is hit first, so it seems like reFi is actually reachable. We
 should be scavenging it too:

 {{{
 >>> break Scav.c:1675 if p == 0xe4c558
 Breakpoint 8 at 0xddf7cc: file rts/sm/Scav.c, line 1675.
 >>> c

 >>> bt
 #0  scavenge_static () at rts/sm/Scav.c:1675
 #1  0x0000000000ddff0a in scavenge_loop () at rts/sm/Scav.c:2085
 #2  0x0000000000db4c49 in scavenge_until_all_done () at rts/sm/GC.c:1088
 #3  0x0000000000db38ba in GarbageCollect (collect_gen=1,
 do_heap_census=false, gc_type=0, cap=0x1020800 <MainCapability>,
 idle_cap=0x0) at rts/sm/GC.c:416
 #4  0x0000000000d995a7 in scheduleDoGC (pcap=0x7fff635d6780,
 task=0x2802f60, force_major=false) at rts/Schedule.c:1799
 #5  0x0000000000d98a7f in schedule (initialCapability=0x1020800
 <MainCapability>, task=0x2802f60) at rts/Schedule.c:545
 #6  0x0000000000d99f79 in scheduleWaitThread (tso=0x4200105388, ret=0x0,
 pcap=0x7fff635d6880) at rts/Schedule.c:2533
 #7  0x0000000000da8b4c in rts_evalLazyIO (cap=0x7fff635d6880, p=0xe4d928,
 ret=0x0) at rts/RtsAPI.c:530
 #8  0x0000000000da9297 in hs_main (argc=7, argv=0x7fff635d6a78,
 main_closure=0xe4d928, rts_config=...) at rts/RtsMain.c:72
 #9  0x000000000041210c in main ()
 }}}

 At this point if I step a few more lines I get the original assertion
 error.

 So in summary: a FUN_STATIC is reachable, but somehow a static object in
 its SRT is collected.

 Alternatively, it could be that the FUN_STATIC becomes unreachable, and
 somehow become reachable again later.

 Simon, I'm looking at the implementation of SRT optimization for
 FUN_STATIC. I don't understand why we look for both the SRT field and
 nptrs of FUN_STATICs in this code: (evacuate())

 {{{
       case FUN_STATIC:
           if (info->srt != 0 || info->layout.payload.ptrs != 0) {
               evacuate_static_object(STATIC_LINK(info,(StgClosure *)q),
 q);
           }
           return;
 }}}

 As far as I understand for FUN_STATICs we should only look at the payload,
 no? I think that what the note in CmmBuildInfoTables.hs says.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15544#comment:20>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list