[GHC] #8834: 64-bit windows cabal.exe segfaults in GC

GHC ghc-devs at haskell.org
Mon Mar 17 09:37:00 UTC 2014


#8834: 64-bit windows cabal.exe segfaults in GC
----------------------------------+----------------------------------
        Reporter:  awson          |            Owner:  jstolarek
            Type:  bug            |           Status:  patch
        Priority:  highest        |        Milestone:  7.8.1
       Component:  Compiler       |          Version:  7.8.1-rc2
      Resolution:                 |         Keywords:
Operating System:  Windows        |     Architecture:  x86_64 (amd64)
 Type of failure:  Runtime crash  |       Difficulty:  Unknown
       Test Case:                 |       Blocked By:
        Blocking:                 |  Related Tickets:
----------------------------------+----------------------------------

Comment (by jstolarek):

 Here's a quick summary of what I've learned:

   * As we noticed earlier the bug does not happen when only `-O1` is used.
 Enabling `-O1 -fspec-constr` triggers the bug, but with `-O1 -fspec-constr
 -fno-cmm-sink` everything works perfectly fine.

   * I believe that Simon Marlow correctly identified offending piece of
 Cmm code, although my knowledge of calling conventions doesn't allow me to
 say why that piece of code is incorrect. Here's how that piece of code
 looks only with `-O1`:

 {{{
 c2h9:
     _s2cz::I64 = _s2ct::I64 + _s2cv::I64;
     (_s2cE::I64) = call "ccall" arg hints:  [PtrHint,
                                              `signed',]  result hints:
 [PtrHint] memchr(_s2cz::I64, 10, _s2cw::I64);
     if (_s2cE::I64 == 0) goto c2h4; else goto c2h5;
 c2h4:
     Hp = Hp - 32;
     _s2cH::P64 = Data.Maybe.Nothing_closure+1;
     goto s2cF;
 c2h5:
     I64[Hp - 24] = GHC.Types.I#_con_info;
     I64[Hp - 16] = _s2cE::I64 - _s2cz::I64;
     I64[Hp - 8] = Data.Maybe.Just_con_info;
     P64[Hp] = Hp - 23;
     _s2cH::P64 = Hp - 6;
     goto s2cF;
 s2cF:
     call MO_Touch(_s2cu::P64);
     I64[Sp - 40] = c2f7;
     R1 = _s2cH::P64;
     I64[Sp - 32] = _s2ct::I64;
     P64[Sp - 24] = _s2cu::P64;
     I64[Sp - 16] = _s2cv::I64;
     I64[Sp - 8] = _s2cw::I64;
     Sp = Sp - 40;
     if (R1 & 7 != 0) goto c2f7; else goto c2f8;
 }}}

   It looks that sinking is not performed here for some reason that I was
 unable to figure out. Enabling `-O1 -fspec-constr` changes the above code
 to:

 {{{
 c2hi:
     _s2dr::I64 = R5;
     _s2du::I64 = R2 + R4;
     _c2fB::I64 = R5;
     (_s2dz::I64) = call "ccall" arg hints:  [PtrHint,
                                              `signed',]  result hints:
 [PtrHint] memchr(_s2du::I64, 10, _c2fB::I64);
     if (_s2dz::I64 == 0) goto c2hd; else goto c2he;
 c2hd:
     call MO_Touch(R3);
     I64[Hp - 128] = Data.ByteString.Internal.PS_con_info;
     P64[Hp - 120] = R3;
     I64[Hp - 112] = R2;
     I64[Hp - 104] = R4;
     I64[Hp - 96] = _s2dr::I64;
     I64[Hp - 88] = :_con_info;
     P64[Hp - 80] = Hp - 127;
     P64[Hp - 72] = GHC.Types.[]_closure+1;
     _c2h8::P64 = Hp - 86;
     Hp = Hp - 72;
     R1 = _c2h8::P64;
     call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
 }}}

   Enabling `-fspec-constr` sinks R2, R3 and R4 past the call to `memchr`
 (which BTW. is consistent with awson's earlier observation that segfault
 is happening somewhere around the call to `memchr`).

   * Changing definition of `isTrivial` from:

 {{{
 isTrivial :: CmmExpr -> Bool
 isTrivial (CmmReg _) = True
 isTrivial (CmmLit _) = True
 isTrivial _          = False
 }}}

   to

 {{{
 isTrivial :: CmmExpr -> Bool
 isTrivial (CmmReg (CmmLocal _)) = True
 isTrivial (CmmLit _) = True
 isTrivial _          = False
 }}}
   makes the bug disappear. I believe this is not a proper fix, because it
 only hides the bug - global registers should be safe to inline! The above
 Cmm code changes to (`-O1 -fspec-constr` enabled):

 {{{
 2hi:
    _s2du::I64 = _s2do::I64 + _s2dq::I64;
    (_s2dz::I64) = call "ccall" arg hints:  [PtrHint,
                                             `signed',]  result hints:
 [PtrHint] memchr(_s2du::I64, 10, _s2dr::I64);
    if (_s2dz::I64 == 0) goto c2hd; else goto c2he;
 2hd:
    call MO_Touch(_s2dp::P64);
    I64[Hp - 128] = Data.ByteString.Internal.PS_con_info;
    P64[Hp - 120] = _s2dp::P64;
    I64[Hp - 112] = _s2do::I64;
    I64[Hp - 104] = _s2dq::I64;
    I64[Hp - 96] = _s2dr::I64;
    I64[Hp - 88] = :_con_info;
    P64[Hp - 80] = Hp - 127;
    P64[Hp - 72] = GHC.Types.[]_closure+1;
    _c2h8::P64 = Hp - 86;
    Hp = Hp - 72;
    R1 = _c2h8::P64;
    call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
 }}}

   * Since it looks like the problem is caused by sinking registers past
 the unsafe foreign call to `memchr` I changed the definition of
 `foreignTargetRegs` in the instance definition of `DefinerOfRegs` in
 `CmmNode` (lines 345-346) from:

 {{{
 foreignTargetRegs (ForeignTarget _ (ForeignConvention _ _ _
 CmmNeverReturns)) = []
 foreignTargetRegs _ = activeCallerSavesRegs
 }}}

   to:

 {{{
 foreignTargetRegs (ForeignTarget _ (ForeignConvention _ _ _
 CmmNeverReturns)) = []
 foreignTargetRegs _ = activeRegs
 }}}

   This says that any global register can be clobbered by unsafe foreign
 call, in practice preventing any global register sinking past such calls.
 After this change the Cmm dump looks like this (again, `-O1 -fspec-constr`
 enabled):

 {{{
 c2hi:
     _s2dr::I64 = R5;
     _s2dq::I64 = R4;
     _s2dp::P64 = R3;
     _s2do::I64 = R2;
     _s2du::I64 = R2 + R4;
     _c2fB::I64 = R5;
     (_s2dz::I64) = call "ccall" arg hints:  [PtrHint,
                                              `signed',]  result hints:
 [PtrHint] memchr(_s2du::I64, 10, _c2fB::I64);
     if (_s2dz::I64 == 0) goto c2hd; else goto c2he;
 c2hd:
     call MO_Touch(_s2dp::P64);
     I64[Hp - 128] = Data.ByteString.Internal.PS_con_info;
     P64[Hp - 120] = _s2dp::P64;
     I64[Hp - 112] = _s2do::I64;
     I64[Hp - 104] = _s2dq::I64;
     I64[Hp - 96] = _s2dr::I64;
     I64[Hp - 88] = :_con_info;
     P64[Hp - 80] = Hp - 127;
     P64[Hp - 72] = GHC.Types.[]_closure+1;
     _c2h8::P64 = Hp - 86;
     Hp = Hp - 72;
     R1 = _c2h8::P64;
     call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
 }}}

 Based on above facts I believe that the problem lies in incorrect
 definition of caller saves registers. On the other hand I believe we are
 defining R2, R3 and R4 as callee-saves according to the specification. So
 there is still possibility that the bug lies somewhere else. At this point
 I am stuck.

 I will attach Cmm dumps in a moment so others can verify my findings. All
 dumps are for the `BugIso` module - they don't include boilerplate from
 `Main`.

 Replying to [comment:54 awson]:
 > Hmm, on my experiments changing `isTrivial` to
 > {{{
 > isTrivial :: CmmExpr -> Bool
 > isTrivial (CmmReg (CmmLocal _)) = True
 > -- isTrivial (CmmLit _) = True
 > isTrivial _ = False
 > }}}
 > alone was not enough to fix things - see my comment 19. So doing
 `CmmLit` branch `True` makes this work?

 On my experiments the change of `isTrivial` that you just posted is enough
 to prevent the segmentation fault.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8834#comment:55>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list