[GHC] #15155: How untagged pointers sneak into banged fields
GHC
ghc-devs at haskell.org
Tue May 15 12:46:55 UTC 2018
#15155: How untagged pointers sneak into banged fields
-------------------------------------+-------------------------------------
Reporter: heisenbug | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone: 8.6.1
Component: Compiler | Version: 8.4.2
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: None/Unknown
Unknown/Multiple |
Test Case: | Blocked By:
Blocking: 14677 | Related Tickets:
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
(''N.B.'' I am writing this up from memory, and cannot verify it just now,
maybe someone can lend a hand, otherwise I'll do it ASAP!)
Here is a way how untagged pointers to strict data can be created in
banged (strict) constructor fields.
This reproduction recipe **depends on the patch from #14677 applied**.
We have 3 modules `A`, `B` and `C`:
{{{#!hs
module A where
data A = X | Y | Z
a = Z
}}}
{{{#!hs
module B where
import A
newtype B = B A
b = B a
}}}
{{{#!hs
{-# language MagicHash #-}
module C where
import A
import B
import GHC.Exts
data C = C !B
c = C b
main = do print (I# (reallyUnsafePtrEquality# a (coerce b))) -- prints 0,
b is softlink
print (I# (dataToTag# c)) -- prints 0: not entered yet
print (case c of C b' -> I# (dataToTag# b')) -- prints 0?
print (case c of C (B a') -> I# (dataToTag# a')) -- prints 3
}}}
-------------------
== Why this happens
`B.b` is a newtype to `A.a` so one would expect that both alias the same
memory location (a ''hardlink'' in filesystem parlance). But currently
reexports are implemented with a special type of closure `IND_STATIC` (a
''softlink'') which needs to be entered to obtain the actual (tagged
pointer). The `IND_STATIC` closure's pointer is never tagged (otherwise it
would never be entered, instead interpreted as a honest-to-goodness `A.A`,
which causes the symptoms seen in #14677).
With #14677 applied to GHC, the unfolding of `B.b` is correctly read when
compiling `C` (with `-O1` and better) and thus the compiler knows that it
should be a tagged pointer value. Thus the construction of `C.c` shortcuts
the entering of `B.b` when filling the strict field, and (because `B.b`
being a softlink, thus untagged) the field ends up carrying a 0 tag.
-------------------------
== How can this be fixed?
I see two possibilities one conservative and one invasive.
=== Conservative
When seeing a coercion unfolding of a tagged value being used to
initialise a strict field, do not skip the evaluatedness check, but cater
for the possibility of an `IND_STATIC` closure. Check the closure type,
and if confirmed, steal the pointee and use that.
=== Invasive
Get rid of the `IND_STATIC` closures altogether. For ''intra-module''
softlinks we can have proper hardlinks (assembler `.equiv` directives, or
LLVM `alias`es). ''Inter-module'' softlinks can also be eliminated by
linker scripts. This would however cause more build artifacts, so I don't
know how hairy it would turn out.
OTOH, it would reduce binary size by eliminating indirection closures and
potential dereferencing code.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15155>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list