[GHC] #12626: Remove redundant type applications in Core
GHC
ghc-devs at haskell.org
Mon Sep 26 13:19:52 UTC 2016
#12626: Remove redundant type applications in Core
-------------------------------------+-------------------------------------
Reporter: nomeata | Owner:
Type: feature request | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by nomeata):
Replying to [comment:5 nomeata]:
> What is special about a data constructor that allows this, that a
function (like, say, `flip`) does not have?
Let me try to answer that myself:
Essentially, you are proposing a way of compressing the representation of
Core, by omitting (type) arguments that can easily and in a syntax-
directed manner be recovered. You are implying here the existence of a
pair of functions
{{{
compressArgs :: DataCon -> [Expr v] -> [Expr v]
decompressArgs :: DataCon -> [Expr v] -> [Expr v]
}}}
that remove and recover implied arguments. For example `compressArgs
"Left" [Bool,Int,True] = [Bool, True]`.
We obviously want to following identity to hold:
{{{
length args == dcArity dc ==> decompressArgs dc (compressArgs dc args) ==
args
}}}
What does `compressArgs` and `decompressArgs` need from `DataCon`? Two
bits of information:
* It’s type (`forall a b. a -> Either a b`) and
* its arity (3, counting type arguments).
Well, `compressArgs` does not need the arity, because it is just the
length of the input list, if the application is saturated. But
`decompressArgs` does (why? see example later). So really, we have a pair
of functions
{{{
compressArgs :: Type -> [Expr v] -> [Expr v]
decompressArgs :: Type -> Arity -> [Expr v] -> [Expr v]
}}}
Here, we want
{{{
decompressArgs ty (length args) (compressArgs ty args) == args
}}}
And with this I can see how the above proposal easily generalizes to
functions: Have
{{{
| Apps (Expr v) Arity [Expr v] -- NEW
}}}
Instead of {{{f `App` x `App` y `App` z}}} you can use `Apps f 3
(compressArgs (exprType f) [x,y,z])`. No information is lost (because of
the above identity), but any redundant information can be removed by
`compressArgs`.
Why do we need to store the arity? Because `compressArgs` can produce the
same compressed list for different input lists:
{{{
compressArgs "forall a. a -> a" [Bool,True] == [True]
compressArgs "forall a. a -> a" [Bool] == [Bool]
compressArgs "forall a. a -> a" [Type, Bool] == [Bool]
}}}
I imagine getting rid of all (well, many more) of the redundant type
applications throughout Core can be big win, so maybe this generalization
should be considered.
Oh, and fun fact: It might be possible to remove `App` completely and
replace it with a pattern synonym here: Due to `compressArgs ty [x] =
[x]`, `Apps f 1 [e]` is equivalent to `App f e`.
So this might not actually be a serious change to core, nor might it be
increasing the number of constructors: Using a bidirectional smart
constructor (which does optimizes the representation under the hood) this
can hopefully be a completely transparent optimization of the
representation.
Using a pattern synonynm works almost; the problem is that the code should
be generic in the binder, but then we have no way of knowing the binder’s
type. But for what it’s worth: It seems to compile with this change:
{{{
data Expr b
= Var Id
| Lit Literal
-- | App (Expr b) (Arg b)
| Apps (Expr b) Arity [Expr b]
| Lam b (Expr b)
| Let (Bind b) (Expr b)
| Case (Expr b) b Type [Alt b] -- See #case_invariant#
| Cast (Expr b) Coercion
| Tick (Tickish Id) (Expr b)
| Type Type
| Coercion Coercion
deriving Data
unpackArgs :: Expr b -> Arity -> [Expr b] -> [Expr b]
unpackArgs _ _ l = l -- do something smarter here
packArgs :: Expr b -> [Expr b] -> [Expr b]
packArgs _ l = l -- do something smarter here
popArg :: Expr b -> Maybe (Expr b, Expr b)
popArg (Apps e a xs) = case unpackArgs e a xs of
[x] -> Just (e, x)
xs -> Just (Apps e (a-1) (packArgs e (init xs)), last xs)
popArg _ = Nothing
pattern App e1 e2 <- (popArg -> Just (e1, e2))
where App e1 e2 | (f, args) <- collectArgs e1
= Apps f (length args +1) (packArgs f (args ++ [e2]))
}}}
I guess the solution to that is adding a type class, such as
{{{
class CompressArgs b where
unpackArgs :: Expr b -> Arity -> [Expr b] -> [Expr b]
unpackArgs _ _ l = l
packArgs :: Expr b -> [Expr b] -> [Expr b]
packArgs _ l = l
}}}
or alternatively
{{{
class HasType a where
hasType :: Expr a -> Type
}}}
and adding instances for the two type of binders we have (`Var` and
`TaggedBndr t`). With a few constraints added in various places (four
modules only), this also compiles. It seems the remaining bit is to solve
the staging issue: The instance should live in `CoreSyn`, but requires
`expType` in `CoreUtils`. If that can be solved, the feature could be
added quite painlessly. Then, in all places where the expression is
traversed, one can any time make the decision to work on `Apps` directly,
instead of `App`.
I expect we’d get the memory performance boost we want (exciting!), but I
also expect that this encoding/decoding in every traversal will cost
runtime.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12626#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list