[GHC] #12469: Memory fence on writeIORef missing on ARM

GHC ghc-devs at haskell.org
Sat Aug 6 03:17:08 UTC 2016


#12469: Memory fence on writeIORef missing on ARM
-------------------------------------+------------------------------------
        Reporter:  rrnewton          |                Owner:
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.2.1
       Component:  Compiler          |              Version:  8.0.1
      Resolution:                    |             Keywords:  memory model
Operating System:  Unknown/Multiple  |         Architecture:  arm
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+------------------------------------
Description changed by rrnewton:

@@ -1,1 +1,1 @@
- The memory model question has been debated now and again.  his
+ The memory model question has been debated now and again. This
@@ -15,1 +15,2 @@
- ```Haskell
+ {{{
+ #!haskell
@@ -32,1 +33,1 @@
- ```
+ }}}
@@ -37,1 +38,2 @@
- ```C
+ {{{
+ #!C
@@ -59,1 +61,0 @@
- ```
@@ -61,1 +62,0 @@
- ```C
@@ -126,1 +126,1 @@
- ```
+ }}}
@@ -135,1 +135,2 @@
- ```C
+ {{{
+ #!C
@@ -145,1 +146,1 @@
- ```
+ }}}

New description:

 The memory model question has been debated now and again. This
 thread from ten years back
 (https://mail.haskell.org/pipermail/haskell-prime/2006-April/001237.html)
 lays out the basic situation with thunk update, writeIORef, and memory
 fences.

 But we recently began experimenting with GHC on ARM platforms, and it
 seems to lack a memory fence that the participants in the cited thread
 expect it to have.

 Here's an attempt to construct a program which writes fields of a data
 structure, and then writes the pointer to that structure to an IORef,
 without the proper fence inbetween:

 {{{
 #!haskell
 import Data.IORef
 import Control.Concurrent

 data Foo = Foo Int deriving Show

 {-# NOINLINE mkfoo #-}
 mkfoo x = Foo x

 {-# NOINLINE dowrite #-}
 dowrite r n = writeIORef r $! mkfoo n

 main =
   do r <- newIORef (Foo 3)
      forkIO (dowrite r 4)
      x <- readIORef r
      print x
 }}}

 Here's the relevant bits of the CMM that results when compiled on an
 ARM 64 machine:

 {{{
 #!C
 mkfoo_rn1_entry() //  []
         { []
         }
     {offset
       c40i:
           P64[MainCapability+872] = P64[MainCapability+872] + 16;
           if (P64[MainCapability+872] > I64[MainCapability+880]) goto
 c40m; else goto c40l;
       c40m:
           I64[MainCapability+928] = 16;
           P64[MainCapability+24] = mkfoo_rn1_closure;
           call (I64[MainCapability+16])(R1) args: 16, res: 0, upd: 8;
       c40l:
           I64[P64[MainCapability+872] - 8] = Foo_con_info;
           P64[P64[MainCapability+872]] = P64[I64[MainCapability+856]];
           P64[MainCapability+24] = P64[MainCapability+872] - 7;
           I64[MainCapability+856] = I64[MainCapability+856] + 8;
           call (I64[P64[I64[MainCapability+856]]])(R1) args: 8, res: 0,
 upd: 8;
     }
 }

 dowrite_entry() //  []
         { []
         }
     {offset
       c44j:
           call a_r3Dy_entry() args: 24, res: 0, upd: 8;
     }
 }

 a_r3Dy_entry() //  [R1]
         { []
         }
     {offset
       c41D:
           if (I64[MainCapability+856] - 16 < I64[MainCapability+864]) goto
 c41H; else goto c41I;
       c41H:
           P64[MainCapability+24] = a_r3Dy_closure;
           call (I64[MainCapability+16])(R1) args: 24, res: 0, upd: 8;
       c41I:
           I64[I64[MainCapability+856] - 8] = block_c41B_info;
           P64[I64[MainCapability+856] - 16] = P64[I64[MainCapability+856]
 + 8];
           I64[MainCapability+856] = I64[MainCapability+856] - 16;
           call mkfoo_rn1_entry() args: 16, res: 8, upd: 8;
     }
 }

 block_c41B_entry() //  [R1]
         { []
         }
     {offset
       c41B:
           _s3Ep::P64 = P64[I64[MainCapability+856] + 8];
           I64[I64[MainCapability+856] + 8] = block_c41G_info;
           _s3Es::P64 = P64[MainCapability+24];
           P64[MainCapability+24] = _s3Ep::P64;
           P64[I64[MainCapability+856] + 16] = _s3Es::P64;
           I64[MainCapability+856] = I64[MainCapability+856] + 8;
           if (P64[MainCapability+24] & 7 != 0) goto u41S; else goto c41K;
       u41S:
           call block_c41G_entry(R1) args: 0, res: 0, upd: 0;
       c41K:
           call (I64[I64[P64[MainCapability+24]]])(R1) args: 8, res: 8,
 upd: 8;
     }
 }

 block_c41G_entry() //  [R1]
         { []
         }
     {offset
       c41G:
           _s3Ev::P64 = P64[P64[MainCapability+24] + 7];
           P64[_s3Ev::P64 + 8] = P64[I64[MainCapability+856] + 8];
           call "ccall" arg hints:  [PtrHint,
                                     PtrHint]  result hints:  []
 dirty_MUT_VAR(MainCapability+24, _s3Ev::P64);
           P64[MainCapability+24] = ()_closure+1;
           I64[MainCapability+856] = I64[MainCapability+856] + 16;
           call (I64[P64[I64[MainCapability+856]]])(R1) args: 8, res: 0,
 upd: 8;
     }
 }
 }}}

 The fence should happen before the write of the pointer into the
 IORef.  I can't find the fence, and can't find a codepath in the
 compiler that would insert it (i.e. with MO_WriteBarrier).

 `dirty_MUT_VAR` is actually too late to perform the fence, but it
 doesn't either:

 {{{
 #!C
 void
 dirty_MUT_VAR(StgRegTable *reg, StgClosure *p)
 {
     Capability *cap = regTableToCapability(reg);
     if (p->header.info == &stg_MUT_VAR_CLEAN_info) {
         p->header.info = &stg_MUT_VAR_DIRTY_info;
         recordClosureMutated(cap,p);
     }
 }
 }}}

 (Neither does `recordClosureMutated`.)

--

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12469#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list