[GHC] #12626: Remove redundant type applications in Core

Mon Sep 26 13:19:52 UTC 2016

#12626: Remove redundant type applications in Core
-------------------------------------+-------------------------------------
        Reporter:  nomeata           |                Owner:
            Type:  feature request   |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.0.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by nomeata):

 Replying to [comment:5 nomeata]:
 > What is special about a data constructor that allows this, that a
 function (like, say, `flip`) does not have?

 Let me try to answer that myself:

 Essentially, you are proposing a way of compressing the representation of
 Core, by omitting (type) arguments that can easily and in a syntax-
 directed manner be recovered. You are implying here the existence of a
 pair of functions
 {{{
 compressArgs :: DataCon -> [Expr v] -> [Expr v]
 decompressArgs :: DataCon -> [Expr v] -> [Expr v]
 }}}
 that remove and recover implied arguments. For example `compressArgs
 "Left" [Bool,Int,True] = [Bool, True]`.

 We obviously want to following identity to hold:
 {{{
 length args == dcArity dc ==> decompressArgs dc (compressArgs dc args) ==
 args
 }}}

 What does `compressArgs` and `decompressArgs` need from `DataCon`? Two
 bits of information:
  *  It’s type (`forall a b. a -> Either a b`) and
  *  its arity (3, counting type arguments).
 Well, `compressArgs` does not need the arity, because it is just the
 length of the input list, if the application is saturated. But
 `decompressArgs` does (why? see example later). So really, we have a pair
 of functions
 {{{
 compressArgs :: Type -> [Expr v] -> [Expr v]
 decompressArgs :: Type -> Arity -> [Expr v] -> [Expr v]
 }}}

 Here, we want
 {{{
 decompressArgs ty (length args) (compressArgs ty args) == args
 }}}

 And with this I can see how the above proposal easily generalizes to
 functions: Have
 {{{
             | Apps (Expr v) Arity [Expr v]    -- NEW
 }}}
 Instead of {{{f `App` x `App` y `App` z}}} you can use `Apps f 3
 (compressArgs (exprType f) [x,y,z])`. No information is lost (because of
 the above identity), but any redundant information can be removed by
 `compressArgs`.

 Why do we need to store the arity? Because `compressArgs` can produce the
 same compressed list for different input lists:
 {{{
 compressArgs "forall a. a -> a" [Bool,True] == [True]
 compressArgs "forall a. a -> a" [Bool] == [Bool]
 compressArgs "forall a. a -> a" [Type, Bool] == [Bool]
 }}}

 I imagine getting rid of all (well, many more) of the redundant type
 applications throughout Core can be big win, so maybe this generalization
 should be considered.

 Oh, and fun fact: It might be possible to remove `App` completely and
 replace it with a pattern synonym here: Due to `compressArgs ty [x] =
 [x]`, `Apps f 1 [e]` is equivalent to `App f e`.

 So this might not actually be a serious change to core, nor might it be
 increasing the number of constructors: Using a bidirectional smart
 constructor (which does optimizes the representation under the hood) this
 can hopefully be a completely transparent optimization of the
 representation.

 Using a pattern synonynm works almost; the problem is that the code should
 be generic in the binder, but then we have no way of knowing the binder’s
 type. But for what it’s worth: It seems to compile with this change:
 {{{
 data Expr b
   = Var   Id
   | Lit   Literal
 --  | App   (Expr b) (Arg b)
   | Apps  (Expr b) Arity [Expr b]
   | Lam   b (Expr b)
   | Let   (Bind b) (Expr b)
   | Case  (Expr b) b Type [Alt b]       -- See #case_invariant#
   | Cast  (Expr b) Coercion
   | Tick  (Tickish Id) (Expr b)
   | Type  Type
   | Coercion Coercion
   deriving Data

 unpackArgs :: Expr b -> Arity -> [Expr b] -> [Expr b]
 unpackArgs _ _ l = l -- do something smarter here

 packArgs :: Expr b -> [Expr b] -> [Expr b]
 packArgs _ l = l  -- do something smarter here

 popArg :: Expr b -> Maybe (Expr b, Expr b)
 popArg (Apps e a xs) = case unpackArgs e a xs of
     [x] -> Just (e, x)
     xs  -> Just (Apps e (a-1) (packArgs e (init xs)), last xs)
 popArg _ = Nothing

 pattern App e1 e2 <- (popArg -> Just (e1, e2))
   where App e1 e2 | (f, args) <- collectArgs e1
                   = Apps f (length args +1) (packArgs f (args ++ [e2]))
 }}}

 I guess the solution to that is adding a type class, such as
 {{{
 class CompressArgs b where
     unpackArgs :: Expr b -> Arity -> [Expr b] -> [Expr b]
     unpackArgs _ _ l = l

     packArgs :: Expr b -> [Expr b] -> [Expr b]
     packArgs _ l = l
 }}}
 or alternatively
 {{{
 class HasType a where
     hasType :: Expr a -> Type
 }}}
 and adding instances for the two type of binders we have (`Var` and
 `TaggedBndr t`). With a few constraints added in various places (four
 modules only), this also compiles. It seems the remaining bit is to solve
 the staging issue: The instance should live in `CoreSyn`, but requires
 `expType` in `CoreUtils`. If that can be solved, the feature could be
 added quite painlessly. Then, in all places where the expression is
 traversed, one can any time make the decision to work on `Apps` directly,
 instead of `App`.

 I expect we’d get the memory performance boost we want (exciting!), but I
 also expect that this encoding/decoding in every traversal will cost
 runtime.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12626#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler