[Git][ghc/ghc][wip/core-docs] Documentation for the Core data constructors

Sebastian Graf (@sgraf812) gitlab at gitlab.haskell.org
Fri May 19 12:25:57 UTC 2023



Sebastian Graf pushed to branch wip/core-docs at Glasgow Haskell Compiler / GHC


Commits:
8fc1802b by Sebastian Graf at 2023-05-19T14:25:51+02:00
Documentation for the Core data constructors

- - - - -


1 changed file:

- compiler/GHC/Core.hs


Changes:

=====================================
compiler/GHC/Core.hs
=====================================
@@ -165,99 +165,163 @@ These data types are the heart of the compiler
 --    optimization, analysis and code generation on.
 --
 -- The type parameter @b@ is for the type of binders in the expression tree.
---
--- The language consists of the following elements:
---
--- *  Variables
---    See Note [Variable occurrences in Core]
---
--- *  Primitive literals
---
--- *  Applications: note that the argument may be a 'Type'.
---    See Note [Representation polymorphism invariants]
---
--- *  Lambda abstraction
---    See Note [Representation polymorphism invariants]
---
--- *  Recursive and non recursive @let at s. Operationally
---    this corresponds to allocating a thunk for the things
---    bound and then executing the sub-expression.
---
---    See Note [Core letrec invariant]
---    See Note [Core let-can-float invariant]
---    See Note [Representation polymorphism invariants]
---    See Note [Core type and coercion invariant]
---
--- *  Case expression. Operationally this corresponds to evaluating
---    the scrutinee (expression examined) to weak head normal form
---    and then examining at most one level of resulting constructor (i.e. you
---    cannot do nested pattern matching directly with this).
---
---    The binder gets bound to the value of the scrutinee,
---    and the 'Type' must be that of all the case alternatives
---
---    IMPORTANT: see Note [Case expression invariants]
---
--- *  Cast an expression to a particular type.
---    This is used to implement @newtype at s (a @newtype@ constructor or
---    destructor just becomes a 'Cast' in Core) and GADTs.
---
--- *  Ticks. These are used to represent all the source annotation we
---    support: profiling SCCs, HPC ticks, and GHCi breakpoints.
---
--- *  A type: this should only show up at the top level of an Arg
---
--- *  A coercion
-
-{- Note [Why does Case have a 'Type' field?]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The obvious alternative is
-   exprType (Case scrut bndr alts)
-     | (_,_,rhs1):_ <- alts
-     = exprType rhs1
-
-But caching the type in the Case constructor
-  exprType (Case scrut bndr ty alts) = ty
-is better for at least three reasons:
-
-* It works when there are no alternatives (see case invariant 1 above)
-
-* It might be faster in deeply-nested situations.
-
-* It might not be quite the same as (exprType rhs) for one
-  of the RHSs in alts. Consider a phantom type synonym
-       type S a = Int
-   and we want to form the case expression
-        case x of { K (a::*) -> (e :: S a) }
-   Then exprType of the RHS is (S a), but we cannot make that be
-   the 'ty' in the Case constructor because 'a' is simply not in
-   scope there. Instead we must expand the synonym to Int before
-   putting it in the Case constructor.  See GHC.Core.Utils.mkSingleAltCase.
-
-   So we'd have to do synonym expansion in exprType which would
-   be inefficient.
-
-* The type stored in the case is checked with lintInTy. This checks
-  (among other things) that it does not mention any variables that are
-  not in scope. If we did not have the type there, it would be a bit
-  harder for Core Lint to reject case blah of Ex x -> x where
-      data Ex = forall a. Ex a.
--}
+-- It is usually instantiated to 'Var' (in 'CoreExpr', the commonly used type
+-- alias). To see why @b@ is useful, see 'TaggedExpr' and its use in
+-- "GHC.Core.Opt.SetLevels".
 
 -- If you edit this type, you may need to update the GHC formalism
 -- See Note [GHC Formalism] in GHC.Core.Lint
 data Expr b
   = Var   Id
+  -- ^ A (term-level, hence 'Id') variable occurrence,
+  -- see Note [Variable occurrences in Core].
+  --
+  -- Refers to the 'Id' record of its binding site;
+  -- if you change its 'GHC.Types.Id.idInfo' you probably want to change the
+  -- 'GHC.Types.Id.idInfo' of the binding site as well.
+
   | Lit   Literal
+  -- ^ A primitive literal, always of unlifted kind.
+  --
+  -- Haskell's number literals such as `0 :: Int` (where `Int` is a lifted data type)
+  -- do /not/ have a representation as a Core 'Literal'; rather they get desugared
+  -- to `I# 0#`, where `0#` is a Literal of type `Int#` (which has unlifted kind
+  -- `TYPE IntRep`).
+  --
+  -- There are a few utility functions to create literals, for example
+  -- 'mkIntLit', 'mkDoubleLit', 'mkCharLit', 'mkStringLit'.
+
   | App   (Expr b) (Arg b)
+  -- ^ A unary application, often occuring in left-associated chains.
+  --
+  -- There are a few utility functions working on application chains;
+  -- see 'collectArgs' and friends.
+  --
+  -- Args have fixed runtime representation;
+  -- see Note [Representation polymorphism invariants].
+
   | Lam   b (Expr b)
+  -- ^ A lambda, binding term arguments ('Id'), type or kind arguments ('TyVar')
+  -- or coercion arguments ('CoVar').
+  --
+  -- Note that type and coercion binders are dependently quantified. E.g., in
+  -- pretty-printed Core,
+  --
+  -- > f = \(@a) (@co :: a ~# Int) -> <expr> :: (<ty> a |> (... co))
+  --
+  -- Where you can pick out dependent quantifiers by the leading `@`.
+  -- Here, `a` occurs in the kind of `co` (the next lambda binder) and in the
+  -- type of `<expr>`, so both the kind of `co` /and/ the type of `<expr>`
+  -- depend on `a`. Furthermore, the type of `<expr>` is casted via a
+  -- coercion mentioning `co`.
+  --
+  -- Subject to Note [Shadowing] for the lambda binder.
+  -- Lambda binders have fixed runtime representation;
+  -- see Note [Representation polymorphism invariants].
+  --
+  -- Use 'mkLams' to create multiple lambdas at once and 'wrapLamBody' to
+  -- map over the body of a lambda without touching its binders (beware of
+  -- name capture!).
+
   | Let   (Bind b) (Expr b)
-  | Case  (Expr b) b Type [Alt b]   -- See Note [Case expression invariants]
-                                    -- and Note [Why does Case have a 'Type' field?]
-  | Cast  (Expr b) CoercionR        -- The Coercion has Representational role
+  -- ^ A let binding. 'Bind' is either a single, 'NonRec'ursive binding or a
+  -- list of mutually 'Rec'ursive bindings.
+  -- Operationally, this corresponds to allocating a thunk for the things bound
+  -- and then executing the sub-expression.
+  --
+  -- Almost exclusively used for value bindings, e.g., 'Id's, in which case be
+  -- aware of the following invariants:
+  --
+  -- * Note [Core letrec invariant]
+  -- * Note [Core let-can-float invariant]
+  -- * Note [Shadowing] of the let binders
+  -- * Let binders have fixed runtime representation;
+  --   see Note [Representation polymorphism invariants].
+  --
+  -- A very special kind of let binding is a /join point/, which can be detected
+  -- with 'GHC.Types.Id.isJoinId_maybe' or 'GHC.Core.Utils.isJoinBind'.
+  -- Consult Note [Join points] for details.
+  --
+  -- In exception to \"exclusively\" above, we may bind a 'TyVar' or a 'CoVar'
+  -- in a non-recursive 'Bind', see Note [Core type and coercion invariant].
+  --
+  -- See also utility functions 'mkLet', 'mkLets', 'mkLetNonRec', 'mkLetRec'.
+
+  | Case  (Expr b) b Type [Alt b]
+  -- ^ Represents a Case expression, doing one level of pattern-matching.
+  -- Pretty-printed as
+  --
+  -- > case scrut of wild {
+  -- >   __DEFAULT -> rhs1;
+  -- >   (:) x xs  -> rhs2;
+  -- > }
+  --
+  -- Operationally, the /scrutinee/ expression `scrut` is evaluated to head
+  -- normal form. The value is bound to the /case binder/ `wild` and is then
+  -- examined to do a one-level switch on the data constructor tag/literal to
+  -- jump to one of the /alternatives/. (Nested matches in Haskell have been
+  -- desugared to multiple Case matches in Core.)
+  --
+  -- The type is the cached return type of the whole case
+  -- expression and hence matches that of the case alternatives,
+  -- see Note [Why does Case have a 'Type' field?].
+  -- The list of case alternatives is roughly represented as
+  --
+  -- > 'Alt' 'DefaultAlt' [] "rhs1", 'Alt' ('DataAlt' "(:)"), ["x","xs"], "rhs2"]
+  --
+  -- Note that there is at most one catch-all `__DEFAULT` case and no overlap
+  -- between other alternatives, so the order of alternatives is arbitrary
+  -- (e.g., /no/ first-match semantics).
+  --
+  -- See also
+  --
+  -- * Note [Case expression invariants]
+  -- * Note [Empty case alternatives]
+  -- * Note [Shadowing] of the case binder
+  -- * Case binders have fixed runtime representation;
+  --   see Note [Representation polymorphism invariants].
+  --
+  -- Utility functions: 'GHC.Core.Utils.mkAltExpr',
+  -- 'GHC.Core.Utils.mkDefaultCase', 'GHC.Core.Utils.mkSingleAltCase' and the
+  -- various helper functions around case alternatives in "GHC.Core.Utils".
+
+  | Cast  (Expr b) CoercionR
+  -- ^ Cast an expression's type via a 'CoercionR' (the R denotes that it has a
+  -- Representational role). Doing so has no effect at runtime, but it is
+  -- crucial for justifying term-level transformations in the type system.
+  --
+  -- This is used to implement `newtype`s (a `newtype` constructor or
+  -- destructor just becomes a 'Cast' in Core) and GADTs.
+
   | Tick  CoreTickish (Expr b)
+  -- ^ Wraps various sorts of debug information around an expression, such as
+  -- profiling SCCs, HPC ticks, and GHCi breakpoints.
+  --
+  -- We try to preserve meaning throughout transformations as best as possible.
+  --
+  -- There are quite a few utility functions that work on 'Ticks':
+  -- 'collectArgsTicks', 'GHC.Core.Utils.stripTicksTop' and friends in
+  -- "GHC.Core.Utils".
+
   | Type  Type
+  -- ^ Signals leaving of the \"term level\" and injects a type expression.
+  --
+  -- Occurs only in argument position of an 'App' such as `id @Int (I# 42#)`
+  -- (where there is a non-pretty-printed `Type ...` wrapped around the type
+  -- expression @Int@) or (rarely) in the RHS of a 'Let' that binds a type
+  -- variable, such as `let { a = @(Rep MyBigGenericRecord) } in ...`
+  -- as in Note [Core type and coercion invariant].
+  --
+  -- If you want to turn a 'Var' into an argument and don't know whether it's an
+  -- term level 'Id', a type-level 'TyVar' or a 'CoVar', use 'varToCoreExpr'.
+
   | Coercion Coercion
+  -- ^ Quite the same as the 'Type' constructor, but for introducing coercion
+  -- arguments (which encode equalities between types).
+  --
+  -- Coercion arguments are morally values and should be treated as such (e.g.,
+  -- somewhat close to tokens of type `()`) until they are erased in STG.
+
   deriving Data
 
 -- | Type synonym for expressions that occur in function argument positions.
@@ -585,6 +649,40 @@ checked by Core Lint.
    multiplicity of the corresponding field /scaled by the multiplicity of the
    case binder/. Checked in lintCoreAlt.
 
+Note [Why does Case have a 'Type' field?]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The obvious alternative is
+   exprType (Case scrut bndr alts)
+     | (_,_,rhs1):_ <- alts
+     = exprType rhs1
+
+But caching the type in the Case constructor
+  exprType (Case scrut bndr ty alts) = ty
+is better for at least three reasons:
+
+* It works when there are no alternatives (see case invariant 1 above)
+
+* It might be faster in deeply-nested situations.
+
+* It might not be quite the same as (exprType rhs) for one
+  of the RHSs in alts. Consider a phantom type synonym
+       type S a = Int
+   and we want to form the case expression
+        case x of { K (a::*) -> (e :: S a) }
+   Then exprType of the RHS is (S a), but we cannot make that be
+   the 'ty' in the Case constructor because 'a' is simply not in
+   scope there. Instead we must expand the synonym to Int before
+   putting it in the Case constructor.  See GHC.Core.Utils.mkSingleAltCase.
+
+   So we'd have to do synonym expansion in exprType which would
+   be inefficient.
+
+* The type stored in the case is checked with lintInTy. This checks
+  (among other things) that it does not mention any variables that are
+  not in scope. If we did not have the type there, it would be a bit
+  harder for Core Lint to reject case blah of Ex x -> x where
+      data Ex = forall a. Ex a.
+
 Note [Core type and coercion invariant]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 We allow a /non-recursive/, /non-top-level/ let to bind type and



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/8fc1802bdee4626ab09e9ee073fd7681b919dbd8

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/8fc1802bdee4626ab09e9ee073fd7681b919dbd8
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230519/308fdc0c/attachment-0001.html>


More information about the ghc-commits mailing list