[GHC] #15155: How untagged pointers sneak into banged fields

GHC ghc-devs at haskell.org
Tue May 15 12:46:55 UTC 2018


#15155: How untagged pointers sneak into banged fields
-------------------------------------+-------------------------------------
           Reporter:  heisenbug      |             Owner:  (none)
               Type:  bug            |            Status:  new
           Priority:  normal         |         Milestone:  8.6.1
          Component:  Compiler       |           Version:  8.4.2
           Keywords:                 |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:  14677          |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 (''N.B.'' I am writing this up from memory, and cannot verify it just now,
 maybe someone can lend a hand, otherwise I'll do it ASAP!)

 Here is a way how untagged pointers to strict data can be created in
 banged (strict) constructor fields.
 This reproduction recipe **depends on the patch from #14677 applied**.

 We have 3 modules `A`, `B` and `C`:

 {{{#!hs
 module A where
 data A = X | Y | Z
 a = Z
 }}}

 {{{#!hs
 module B where
 import A
 newtype B = B A
 b = B a
 }}}

 {{{#!hs
 {-# language MagicHash #-}

 module C where
 import A
 import B
 import GHC.Exts

 data C = C !B
 c = C b

 main = do print (I# (reallyUnsafePtrEquality# a (coerce b))) -- prints 0,
 b is softlink
           print (I# (dataToTag# c)) -- prints 0: not entered yet
           print (case c of C b' -> I# (dataToTag# b')) -- prints 0?
           print (case c of C (B a') -> I# (dataToTag# a')) -- prints 3
 }}}

 -------------------
 == Why this happens

 `B.b` is a newtype to `A.a` so one would expect that both alias the same
 memory location (a ''hardlink'' in filesystem parlance). But currently
 reexports are implemented with a special type of closure `IND_STATIC` (a
 ''softlink'') which needs to be entered to obtain the actual (tagged
 pointer). The `IND_STATIC` closure's pointer is never tagged (otherwise it
 would never be entered, instead interpreted as a honest-to-goodness `A.A`,
 which causes the symptoms seen in #14677).

 With #14677 applied to GHC, the unfolding of `B.b` is correctly read when
 compiling `C` (with `-O1` and better) and thus the compiler knows that it
 should be a tagged pointer value. Thus the construction of `C.c` shortcuts
 the entering of `B.b` when filling the strict field, and (because `B.b`
 being a softlink, thus untagged) the field ends up carrying a 0 tag.

 -------------------------
 == How can this be fixed?

 I see two possibilities one conservative and one invasive.

 === Conservative

 When seeing a coercion unfolding of a tagged value being used to
 initialise a strict field, do not skip the evaluatedness check, but cater
 for the possibility of an `IND_STATIC` closure. Check the closure type,
 and if confirmed, steal the pointee and use that.

 === Invasive

 Get rid of the `IND_STATIC` closures altogether. For ''intra-module''
 softlinks we can have proper hardlinks (assembler `.equiv` directives, or
 LLVM `alias`es). ''Inter-module'' softlinks can also be eliminated by
 linker scripts. This would however cause more build artifacts, so I don't
 know how hairy it would turn out.

 OTOH, it would reduce binary size by eliminating indirection closures and
 potential dereferencing code.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15155>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list