Implementation idea for unboxed polymorphic types

Francesco Mazzoli f at mazzo.li
Sun Nov 15 10:26:21 UTC 2015


(A nicely rendered version of this email can be found at <https://gist.github.com/bitonic/52cfe54a2dcdbee1b7f3>)

## Macro types

I very often find myself wanting unboxed polymorphic types
(e.g. types that contain `UNPACK`ed type variables). I find
it extremely frustrating that it's easier to write fast _and_
generic code in C++ than in Haskell.

I'd like to submit to the mailing list a very rough proposal
on how this could be achieved in a pretty straightforward way
in GHC.

The proposal is meant to be a proof of concept, just to show that
this could be done rather easily. I did not think about a nice
interface or the implementation details in GHC. My goal is to
check the feasibility of this plan with GHC developers.

I'll call such types "macro types", since their effect is similar
to defining a macro that defines a new type for each type
variable instantiation.

Consider

```
data #Point a = Point
  { x :: {-# UNPACK #-} !a
  , y :: {-# UNPACK #-} !a
  }
```

This definition defines the macro type `#Point`, with one parameter
`a`.

Macro types definition would be allowed only for single-constructor
records. The intent is that if we mention `#Point Double`, it will
be equivalent to

```
data PointDouble = PointDouble
  { x :: {-# UNPACK #-} !Double
  , y :: {-# UNPACK #-} !Double
  }
```

To use `#Point` generically, the following type class would be
generated:

```
class PointFamily a where
  data #Point a :: * -- Family of types generated by @data #Point a at .
  #Point :: a -> a -> #Point a -- Constructor.
  #x :: #Point a -> a -- Projection @x at .
  #y :: #Point a -> a -- Projection @y at .
```

Thi type class lets us work with `#Point`s generically, for example

```
distance :: (PointFamily a, Fractional a) => #Point a -> #Point a -> a
distance p1 p2 =
  let dx = #x p1 - #x p2
      dy = #y p1 - #y p2
  in sqrt (dx*dx + dy*dy)
```

Internally, for every type appearing for `a`, e.g. `#Point Double`,
a new type equivalent to the `PointDouble` above would be generated
by GHC, with the corresponding instance

```
instance PointFamily Double where
  data #Point Double = PointDouble
  #x = x
  #y = x
```

If it's not possible to instantiate `#Point` with the provided type
(for example because the type is not `UNPACK`able, e.g.
`#Point (Maybe A)`), GHC would throw an error.

Note that we can compile `distance` in its polymorphic version
(as opposed to C++ templates, where template functions _must_ be
instantiated at every use). The polymorphic `distance` would
require a call to "virtual functions" `#x` and `#y`, as provided by
the `PointFamily` dictionary. But if we use
`INLINE` or `SPECIALIZE` pragmas the virtual calls to `#x` and `#y`
would disappear, making this as efficient as if we were to define
`distance` on the manually defined `PointDouble`. Compiler hints
would be put in place to always inline functions using macro types,
if possible.

Note that the inlining is only important so that the `PointFamily`
dictionary disappears, e.g. functions containing recursive
helpers are fine, such as

```
{-# INLINE leftmost #-}
leftmost :: forall a. (PointFamily a, Ord a) => [#Point a] -> #Point a
leftmost [] = error "leftmost: no points"
leftmost (p0 : ps0) = go p0 ps0
  where
    go :: #Point a -> [#Point a] -> Point# a
    go candidate (p : ps) =
      if #x p < #x candidate
        then go p ps
        else go candidate ps
```

It might be worth considering throwing a warning when a top-level
definition whose type contains a macro type cannot be inlined, since
the main performance benefit of using macro types would be lost.

We can define instances for these types as normal, for instance

```
instance (Show a, PointFamily a) => Show (#Point a) where
  {-# INLINE show #-}
  show pt = "Point{x = " ++ #x pt ++ ", y = " ++ #y pt ++ "}"
```

`deriving` support could also be added.

## Further ideas

### Hide or remove `PointFamily` from the user

In the examples above `PointFamily` is manipulated explicitely
(e.g. in the type signature for `distance`).
In most cases the right constraint could be generated
automatically by GHC, although I think direct access to the
type class would be beneficial (history shows that direct
access to these facilities is good, see upcoming explicit
type applications).

Maybe the type class associated to a macro type should not even
exist -- for example we could simply represent `#Point` as a type
family and treat construction and destruction as built-in syntax
(the `#` prefix).

### Pattern matching

Sugar could be added to pattern match, e.g.

```
foo :: Point# a -> ...
distance (Point# x1 y1) = ...
  or
dinstance Point#{..} = ... -- #x and #y are available
```

### No "record types" limitation

Instead of restricting ourselves to single-constructor records,
we could simply generate

```
data Point a = Point a
  { x :: !a
  , y :: !a
  }

class PointFamily a where
  data Point# a :: *
  destruct :: Point# a -> Point a
  construct :: Point a -> Point# a
```

However, I think it would be harder to guarantee the well-behavedness
of the inlined functions if we had this intermediate type. I also
don't think macro types would be very useful beyond polymorphic
unboxed types.


More information about the ghc-devs mailing list