newtype/datatype (was efficiency)
Ross Paterson
ross@soi.city.ac.uk
Thu, 17 Jan 2002 12:16:14 +0000
On Wed, Jan 16, 2002 at 11:25:43PM -0500, Ken Shan wrote:
> Every type "t" in Haskell can be denotationally modeled by a complete
> partial order (CPO), which we will notate as "[t]". In fact, every
> type in standard Haskell corresponds to a -pointed- CPO, meaning a CPO
> with a least element "bottom".
We can avoid smash products and coalesced sums by analysing [t] as being
the lifting of a not-necessarily-pointed cpo T[[t]]:
[t] = lift (T[[t]])
We can define T[[t]] by induction over t, using the following operations
on cpos (not necessarily having a bottom element):
lift D = D plus a new bottom element
D x E = cartesian product of D and E
D + E = disjoint union of D and E
D -> E = the cpo of continuous functions from D to E
1 = the one-element cpo
0 = the empty cpo
For primitive types like Int, Integer, Double, etc, T[[t]] is just a
discrete cpo (i.e. a set, with no bottom element).
For Haskell's function type, we have
T[[s -> t]] = [s] -> [t]
For a data declaration
data D = c1 | ... | cn
we have
T[[D]] = C[[c1]] + ... + C[[cn]]
C[[K a1 ... an]] = A[[a1]] x ... x A[[an]]
C[[K]] = 1 -- the identity for cartesian product
A[[t]] = [t]
A[[! t]] = T[[t]]
For a newtype declaration
newtype N = K t
we have
T[[N]] = T[[t]]
To take the original questions,
newtype Empty a = E ()
data Empty' a = E'
newtype Pair v w a = P (v a, w a)
data Pair' v w a = P' (v a) (w a)
we have
T[[Empty]] = T[[()]] = 1
T[[Empty']] = C[[E']] = 1
T[[Pair v w a]] = T[[(v a, w a)]] = [v a] x [w a]
T[[Pair' v w a]] = C[[P' (v a) (w a)]] = [v a] x [w a]
So semantically they're the same. I suspect the reason Chris used
newtype is that GHC knows about the predefined tuple types and does
some extra optimizations on them, but doing this for user-defined type
constructors across module boundaries is a bit harder.
Similarly, given
newtype M = MC t
data N = NC !t
we have T[[M]] = T[[t]] = T[[N]], though as you pointed out, pattern
matching is defined differently for the two types. (The formal
semantics in the Report seems to be missing the relevant rule, though
its informal semantics is clear on this point.)
As for recursive types, I don't know a simple presentation of the general
case, but if there's no recursion through the first argument of a ->
type it's just the smallest cpo such that the two sides are isomorphic.
For example, for
data Nat = Zero | Succ Nat
we have
T[[Nat]] ~= 1 + lift (T[[Nat]])
or
[Nat] ~= lift (1 + [Nat])
The elements of [Nat] are _|_, Zero, Succ _|_, Succ Zero, etc and, to
make it complete, an extra element Succ (Succ (Succ ...)) which is the
least upper bound of the chain
_|_ <= Succ _|_ <= Succ (Succ _|_) <= ...
On the other hand, for
data Nat = Zero | Succ !Nat
we have
T[[Nat]] ~= 1 + T[[Nat]]
i.e. the elements of T[[Nat] are equivalent to the natural numbers, and
[Nat] has an extra bottom element.
Another example (from Patrik Jansson):
newtype Void = Void Void
then T[[Void]] is the least cpo such that
T[[Void]] = T[[Void]]
so it is the empty cpo 0, and [Void] is a one-element cpo.
Another definition of the same cpo is
data Void = Void !Void
and many more are possible, even without GHC's extension.