[GHC] #12618: Add saturated constructor applications to Core

Sun Sep 25 18:21:14 UTC 2016

#12618: Add saturated constructor applications to Core
-------------------------------------+-------------------------------------
        Reporter:  simonpj           |                Owner:
            Type:  feature request   |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.0.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Description changed by heisenbug:

@@ -19,1 +19,1 @@
- So here's another idea, which came up in converstion at ICFP'16: '''add a
+ So here's another idea, which came up in conversation at ICFP'16: '''add a
@@ -97,1 +97,1 @@
- * For `ConApp "(:)" [e1, e2]`, the type arument is just `exprType e1`.
+ * For `ConApp "(:)" [e1, e2]`, the type argument is just `exprType e1`.
@@ -108,1 +108,1 @@
- A simple once-and-for-all analyis on the `DataCon` will establish how to
+ A simple once-and-for-all analysis on the `DataCon` will establish how to
@@ -124,1 +124,1 @@
- types a `TyConApp` is not required to be satureted, but we could review
+ types a `TyConApp` is not required to be saturated, but we could review
@@ -142,1 +142,1 @@
- We'll only know if we try it. I estimiate that it would take less than a
+ We'll only know if we try it. I estimate that it would take less than a

New description:

 A long-standing performance bug in GHC is its behaviour on nested tuples.

 * The poster-child case is #5642.
 * The problem is that when you have a nested tuple application, e.g.
 `((4,True),('c','d'))`, the size of the type arguments grows quadratically
 with the expression size.  This bites badly in some approaches to generic
 programming, which use nested tuples heavily.
 * It's explained in detail in Section 2.3 of
 [http://research.microsoft.com/en-
 us/um/people/simonpj/papers/variant-f/index.htm Scrap your type
 applications].

 The same paper describes a solution, a modification of System F called
 System IF.  It's neat, and I did once implement in in GHC.  But it was
 quite complicated; most of the time the win was not big; and I don't know
 how it'd interact with casts, coercions, and type dependency in the latest
 version of GHC's Core.

 So here's another idea, which came up in conversation at ICFP'16: '''add a
 staturated constructor application to Core'''.  So Core looks like
 {{{
 data Expr v = Var v
             | App (Expr v) (Expr v)
             ...
             | ConApp DataCon [Expr v]    -- NEW
 }}}
 Now I hate the idea of adding a new constructor to Core;
 I often brag about how few constructors it has.  But the idea is this:

 * A `ConApp` is always saturated; think of it as an uncurried application.

 * Every data constructor has a wrapper Id.  For simple constructors like
 `(:)`, the wrapper is just a curried version:
 {{{
 (:) = /\a. \(x:a). \(y:[a]). (:) [x,y]
 }}}
   where the "`(:) [x,y]`" is just my concrete syntax for a `ConApp` node.

 * We are used to having an intro and elim form for each type former. So
 for `(->)` the intro form is `\x.e` and the elim form is `(e1 e2)`.  For a
 data type like `Maybe`, the elim form is `case`, but what's the intro
 form?  `ConApp` of course.

 * A `ConApp` mentions the `DataCon` explicitly, rather than having it
 buried inside the `IdDetails` of an `Id`, which somehow seems more honest.

 The proximate reason for doing this in the first place is to allow us to
 omit type arguments.  Consider `Just True`.  Curently we represent this as
 {{{
    Var "Just" `App` Type boolTy `App` Var "True"
 }}}
 But with `ConApp` we can say
 {{{
    ConApp "Just" [ConApp "True" []]
 }}}
 because it's easy to figure out the `boolTy` type argument from the
 argument.  (We don't really have strings there, but you get the idea.)

 Can we omit ''all'' the type arguments?  No: we can omit only those that
 appear free in any of the argument types.  That is usally all of them
 (including existentials) but not always.  Consider:
 {{{
   data (,) a b where
      (,):: forall a b. a -> b -> (a,b)

   data [] a where
      []  :: forall a. [a]
      (:) :: forall a. a -> [a] -> [a]

   data Either a b where
      Left  :: forall a b. a -> Either a b
      Right :: forall a b. b -> Either a b

   data (:=:) a b where
      Refl :: forall a b. (a~b) -> :=: a b

   data Foo where
      MkFoo :: a -> (a -> Int) -> Foo
 }}}
 For all of these data constructors except `[]` (nil), `Left` and `Right`
 we can omit all the type arguments, because we can recover them by simple
 matching against the types of the arguments.  A very concrete way to think
 about this is how to implement
 {{{
 exprType :: Expr Id -> Type
 exprType (Var v)   = varType v
 exprType (Lam b e) = mkFunTy (varType b) (exprType e)
 exprType (App f a) = funResultTy (exprType f)
 ...
 exprType (ConApp con args) = mkTyConApp (dataConTyCon con) ???
 }}}
 We know that the result type of type of a `ConApp` will be `T t1 ..tn`
 where `T` is the parent type constructor of the `DataCon`.  But what about
 the (universal) type args `???`?  We can get them from the types of the
 arguments `map exprType args`:

 * For `ConApp "(:)" [e1, e2]`, the type argument is just `exprType e1`.

 * For `ConApp "(:=:)" [e]`", we expect `exprType e` to return a type
 looking like `TyConApp "~" [t1, t2]`.  Then `t1` and `t2` are the types we
 want.  So matching is required.

 * What about an application of `Left`??  We need to recover two type args,
 but `exprType e1` gives us only one. So we must retain the other one in
 the application: `ConApp "Left" [Type ty2, e1]`.  Similarly for the empty
 list.

 A simple once-and-for-all analysis on the `DataCon` will establish how to
 do the matching, which type args to retain, etc.


 Tradeoffs:

 * Pro: We can eliminate almost all type args of almost all data
 constructors; and for nested tuples we can eliminate all of them.

 * Pro: it's elegant having the intro/elim duality.

 * Pro: in GHC we often ask "is this expression a saturated constructor
 application?" (see `exprIsConApp_maybe`) and `ConApp` makes it easier to
 answer that question.

 * Pro: we do exactly this in types: we have `AppTy` and `TyConApp`.  (In
 types a `TyConApp` is not required to be saturated, but we could review
 that choice.)

 * Con: adding a constructor is a big deal.  In lots of places we'd end up
 saying
 {{{
 f (App e1 e2)   = App (f e1) (f e2)
 f (ConApp c es) = ConApp c (map f es)
 }}}
   In the olden days GHC's `App` had multiple arguments and the continual
 need to work over the list was a bit tiresome.  Still `ConApp` is very
 simple and uniform; quite often adding `map` won't be difficult; and it
 may well be that constructors need to be treated differently anyway.

 * Con: it's not a general solution to the type-argument problem.  See
 #9198 for example.  System IF is the only general solution I know, but it
 seems like too big a sledgehammer.

 We'll only know if we try it. I estimate that it would take less than a
 week to work this change through all of GHC. 90% of it is routine.

 Other possibly-relevant tickets are #8523, #7428, #9630.

--

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12618#comment:13>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler