Re: [GHC] #13615: Nondeterminism in ‘pure’ function w/ parallel evaluation & memo combinators

Sun Jun 25 00:45:20 UTC 2017

#13615: Nondeterminism in ‘pure’ function w/ parallel evaluation & memo combinators
-------------------------------------+-------------------------------------
        Reporter:  pacak             |                Owner:  bgamari
            Type:  bug               |               Status:  new
        Priority:  highest           |            Milestone:  8.2.1
       Component:  Compiler          |              Version:  7.10.3
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Incorrect result  |  Unknown/Multiple
  at runtime                         |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by bgamari):

 I have confirmed that we indeed hit the codepath in `raiseAsync`
 responsible for updating the thunk under evaluation. I believe this fills
 the final question in comment:36.

 To recap, what is happening is the following,

 1. Thread A enters a `fromListWith` closure and begins folding over the
 insertions
 2. At some point during this fold we need to garbage collect (usually due
 to a stack overflow, judging from the eventlog); the garbage collector
 construct an `AP_STACK` closure capturing the state of Thread A, including
 the partially-finished `fromListWith` computation. The `fromListWith`
 thunk is updated to point to this `AP_STACK`.
 3. Garbage collection commences and finishes, evaluation resumes
 4. At some point Thread A is resumed, entering the previously saved
 `AP_STACK` computation which we prepared in step (2); we are blackholing
 lazily so no update to the `AP_STACK` closure is made
 5. At some later point Thread B tries to force the same `AP_STACK`
 computation; finding that it's not blackholed, it enters

 We now have two mutator threads performing evaluation on the same,
 effectful computation with shared, mutable state.

 My first intuition says that the easiest way to avoid this would be to
 unconditionally eagerly blackhole `AP_STACK`s. I believe this can be done
 straightforwardly,
 {{{#!diff
 diff --git a/rts/Apply.cmm b/rts/Apply.cmm
 index f14bb8f331..a35b41e5b0 100644
 --- a/rts/Apply.cmm
 +++ b/rts/Apply.cmm
 @@ -464,6 +464,16 @@ INFO_TABLE(stg_AP_STACK,/*special
 layout*/0,0,AP_STACK,"AP_STACK","AP_STACK")

    Words = StgAP_STACK_size(ap);

 +  W_ old_info;
 +  (old_info) = prim %cmpxchgW(ap, stg_AP_STACK_info, stg_WHITEHOLE_info);
 +  if (old_info != stg_AP_STACK_info) {
 +    /* someone else beat us to it */
 +    jump ENTRY_LBL(stg_WHITEHOLE) (ap);
 +  }
 +  StgInd_indirectee(ap) = CurrentTSO;
 +  W_[ap] = stg_EAGER_BLACKHOLE_info;
 +
    /*
     * Check for stack overflow.  IMPORTANT: use a _ENTER check here,
     * because if the check fails, we might end up blackholing this very
 }}}

 However, in doing this I'm seeing `<<loops>>` in the testcase. I haven't
 yet worked out why.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/13615#comment:45>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler