[GHC] #14092: hs-boot unfolding visibility not consistent between --make and -c

Sun Aug 6 03:19:14 UTC 2017

#14092: hs-boot unfolding visibility not consistent between --make and -c
-------------------------------------+-------------------------------------
           Reporter:  ezyang         |             Owner:  (none)
               Type:  bug            |            Status:  new
           Priority:  normal         |         Milestone:
          Component:  Compiler       |           Version:  8.2.1
           Keywords:  hs-boot.       |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 duog's comment in https://phabricator.haskell.org/D3815#107812 pointed out
 an inconsistency between hs-boot handling in --make and -c that I have
 been dimly aware of for some time now.

 Here is how to trigger the situation:

 {{{
 -- A.hs-boot
 module A where
 f :: Int -> Int

 -- B.hs
 module B where
 import {-# SOURCE #-} A

 -- A.hs
 module A where
 import B
 f :: Int -> Int
 f x = x + 1

 -- C.hs
 module C where
 import {-# SOURCE #-} A
 g = f 2
 }}}

 When we run `ghc-head C.hs --make -O -ddump-simpl -fforce-recomp`, we see
 that f has been successfully inlined:

 {{{
 -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
 g :: Int
 [GblId,
  Caf=NoCafRefs,
  Str=m,
  Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
          WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 20}]
 g = GHC.Types.I# 3#
 }}}

 However, if we then one-shot compile C.hs, as in `ghc-head -c C.hs  -O
 -ddump-simpl -fforce-recomp`, the unfolding is not seen:

 {{{
 -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
 C.g1 :: Int
 [GblId,
  Caf=NoCafRefs,
  Str=m,
  Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
          WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 20}]
 C.g1 = GHC.Types.I# 2#

 -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
 g :: Int
 [GblId,
  Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
          WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 20 0}]
 g = f C.g1
 }}}

 The crux of the matter is that `--make` and `-c` have different rules
 about when to make use of the unfolded definition.

 The `--make` rule is: compile the modules in some topological order. Any
 module that comes after `A.hs` sees the improved unfoldings. And as it
 turns out, the current topological order GHC picks is this:

 {{{
 [1 of 4] Compiling A[boot]          ( A.hs-boot, A.o-boot )
 [2 of 4] Compiling B                ( B.hs, B.o )
 [3 of 4] Compiling A                ( A.hs, A.o )
 [4 of 4] Compiling C                ( C.hs, C.o )
 }}}

 The `-c` rule is more complicated. Every module records a list of
 transitive module dependencies, and whether or not a boot or non-boot was
 used. We load an hi-boot file if NONE of the modules we imported "saw" the
 full hi module, AND we only did direct SOURCE imports. If anyone has
 transitively imported A.hs, we load the hi file.

 In the example above, C.hs ONLY imports A.hs-boot, so hs-boot is obliged
 to load A.hi-boot, and thus it does not see the unfolding.

 The `-c` behavior is the correct behavior, because with the `--make`
 behavior it is easy to get into a situation where the build is dependent
 on the topological order chosen.  Consider:

 * `A.hs-boot`
 * `B.hs-boot`
 * `A.hs` imports `A.hs-boot`, `B.hs-boot`
 * `B.hs` imports `A.hs-boot`, `B.hs-boot`

 (Ignore the fact that in GHC today you can't actually directly import your
 hs-boot file; you can fix this by importing dummy modules.)

 Now you can see that depending on the order you compile, e.g., A.hs-boot,
 B.hs-boot, A.hs, B.hs versus B.hs, A.hs, either A.hs or B.hs will be
 compiled with the unfoldings for its partner, but not vice versa. This
 doesn't sound good!

 Unfortunately, fixing things properly in `--make` mode is quite
 troublesome. To understand why, we have to first remind ourself about loop
 retypechecking in make mode. Remember that GHC knot-ties all of the
 typechecker data structures together
 (https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/TyingTheKnot).
 This means that at the point we typecheck an hs-boot file, we are building
 an (immutable) graph on top of the impoverished, type signature only
 declarations from the hs-boot file.  When we finish typechecking the loop
 closer, the only way to "update" all of the old references to the hs-boot
 file is to throw out the entire graph and rebuild it from scratch (the
 loop retypecheck!)

 So let's think about the A.hs-boot B.hs A.hs C.hs case. At the point we're
 typechecking A.hs, we throw out the graph referencing A.hs-boot and
 rebuild it referencing A.hs so that everything gets knot-tied to A.hs. But
 THEN, C.hs comes around, and it's like, "Oy, I don't want the A.hs version
 of the graph, I want the A.hs-boot version of the graph." In `-c` mode,
 this is not a problem, since we have to rebuild the graph from scratch
 anyway, but in `--make` this is a big deal, since we have to throw
 everything out and rebuild it AGAIN.

 One implementation strategy that doesn't involve mucking about with HPT
 retypechecking is to somehow make the typechecker aware of what unfoldings
 it should "see" and which it should not. The idea is that if it can ignore
 unfoldings that it doesn't have legitimate access to, that should be "just
 as good" as having typechecked against the hs-boot graph itself. But this
 sounds very tricky and difficult to get right... so here we are.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14092>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler