[GHC] #15544: Non-deterministic segmentation fault in cryptohash-sha256 testsuite

GHC ghc-devs at haskell.org
Thu Sep 13 09:17:49 UTC 2018


#15544: Non-deterministic segmentation fault in cryptohash-sha256 testsuite
-------------------------------------+-------------------------------------
        Reporter:  bgamari           |                Owner:  (none)
            Type:  bug               |               Status:  patch
        Priority:  highest           |            Milestone:  8.6.1
       Component:  Compiler          |              Version:  8.4.3
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):  Phab:D5145
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonmar):

 For future archaeology, here are the details of what caused the crash.
 The fix has comments to explain the underlying problem and the fix, but
 here I want to put the full details of how it manifested in this
 particular instance, just in case we need to revisit.

 The program evaluates a CAF after it has been GC'd.

 The best way to diagnose it is to add `-debug` to the `ghc-options` in the
 `.cabal` file, and make sure that you have Phab:D4963 merged (this wasn't
 merged in 8.6 at the time, which meant the assertion for GC'd CAFs didn't
 fire as it should have).

 You can also comment out a bunch of the code in the test case to make it
 fail faster and with less code, see Phab:P183

 Now, the CAF in question is this:

 {{{
 x_rbHt :: Data.ByteString.Internal.ByteString
 [GblId] =
     [] \u []
         case
             newMutVar# [GHC.ForeignPtr.NoFinalizers GHC.Prim.realWorld#]
         of
         { (#,#) ipv_sbMX [Occ=Once] ipv1_sbMY [Occ=Once] ->
               case __pkg_ccall bytestring-0.10.8.2 [addr#1_rbHs ipv_sbMX]
 of {
                 (#,#) _ [Occ=Dead] ds2_sbN2 [Occ=Once] ->
                     case word2Int# [ds2_sbN2] of sat_sbN4 [Occ=Once] {
                       __DEFAULT ->
                           let {
                             sat_sbN3 [Occ=Once] ::
 GHC.ForeignPtr.ForeignPtrContents
                             [LclId] =
                                 CCCS GHC.ForeignPtr.PlainForeignPtr!
 [ipv1_sbMY];
                           } in
                             Data.ByteString.Internal.PS [addr#1_rbHs
 sat_sbN3 0# sat_sbN4];
                     };
               };
         };
 }}}

 which is referred to by this function:

 {{{
 $wxs_rbHu
   :: GHC.Prim.Int#
      -> (# Data.ByteString.Internal.ByteString,
            [Data.ByteString.Internal.ByteString] #)
 [GblId, Arity=1, Str=<S,1*U>, Unf=OtherCon []] =
     sat-only [] \r [ww_sbN5]
         case ww_sbN5 of ds1_sbN6 [Occ=Once] {
           __DEFAULT ->
               let {
                 sat_sbNb [Occ=Once] ::
 [Data.ByteString.Internal.ByteString]
                 [LclId] =
                     [ds1_sbN6] \u []
                         case -# [ds1_sbN6 1#] of sat_sbN7 [Occ=Once] {
                           __DEFAULT ->
                               case $wxs_rbHu sat_sbN7 of {
                                 (#,#) ww2_sbN9 [Occ=Once] ww3_sbNa
 [Occ=Once] ->
                                     : [ww2_sbN9 ww3_sbNa];
                               };
                         };
               } in  (#,#) [x_rbHt sat_sbNb];
           1# -> (#,#) [x_rbHt GHC.Types.[]];
         };
 }}}

 Note that
 * the function refers to the CAF
 * it is recursive, and
 * the recursive call is inside a thunk (`sat_sbNb`)

 We generated the following SRTs (use `-ddump-cmm` to see this):

 {{{
 [sat_sbNb_entry() //  [R1]
          { info_tbls: [(cc8F,
                         label: sat_sbNb_info
                         rep: HeapRep 1 nonptrs { Thunk }
                         srt: Just x_rbHt_closure),
                        (cc8H,
                         label: block_cc8H_info
                         rep: StackRep []
                         srt: Nothing)]

  $wxs_rbHu_entry() //  [R2]
          { info_tbls: [(cc8S,
                         label: $wxs_rbHu_info
                         rep: HeapRep static { Fun {arity: 1 fun_type:
 ArgSpec 4} }
                         srt: Just x_rbHt_closure)]
 }}}

 ie. both the function and the thunk have singleton SRTs, pointing directly
 to the CAF. This happens because these two declarations are in cycle, and
 the SRT pass assigns all declarations in a cycle the same SRT.  The SRT
 contains all the references from the RHSs of the declarations, which would
 be `{$wxs_rbHu_closure, x_rbHt_closure}` except that we remove the
 recursive reference to `$wxs_rbHu_closure` from the set (it's not
 necessary to have recursive references in the SRT, the SRT only needs to
 point to all the things that can be reached from this group).

 The crash occurred as follows. Let's call the thunk `sat_sbNb_entry` "A",
 and the function `$wxs_rbHu_entry` "B".

 * suppose we GC when A is alive, and B is not otherwise reachable.
 * B is "collected", meaning that it doesn't make it onto the static
 objects list during this GC, but nothing bad happens yet.
 * Next, suppose we enter A, and then call B. (remember that A refers to B)
 At the entry point to B, we GC. This puts B on the stack, as part of the
 RET_FUN stack frame that gets pushed when we GC at a function entry point.
 * This GC will now reach B
 * But because B was previous "collected", it breaks the assumption that
 static objects are never resurrected. See `Note [STATIC_LINK fields]` in
 rts/sm/Storage.h for why this is bad.
 * In practice, the GC thinks that B has already been visited, and so
 doesn't visit X, and catastrophe ensues.

 The breakage is caused by a combination of two things:
 1. the SRT for the thunk A doesn't point to the function B, even though it
 calls the function.
 2. the function's entry code causes a pointer to the function's closure to
 appear on the stack, when it wasn't previously visible to the GC.

 We opted to fix (1), because it's not clear whether (2) could happen in
 other ways.

 It turned out that (1) could happen in two ways:
 * a "shortcutting" optimisation in SRT generation
 * omitting recursive references from the SRT of a recursive group

 For completeness, here is what we want to generate instead:

 {{{
 [sat_sbNb_entry() //  [R1]
          { info_tbls: [(cc8F,
                         label: sat_sbNb_info
                         rep: HeapRep 1 nonptrs { Thunk }
                         srt: Just $wxs_rbHu_closure),   <--- SRT points to
 the function, not the CAF
                        (cc8H,
                         label: block_cc8H_info
                         rep: StackRep []
                         srt: Nothing)]
            stack_info: arg_space: 8 updfr_space: Just 8
          }
 }}}

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15544#comment:31>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list