Records in Haskell

Simon Peyton-Jones simonpj at
Sat Jan 14 00:52:56 CET 2012

Thanks to Greg for leading the records debate.  I apologise that I
don't have enough bandwidth to make more than an occasional
contribution.  Greg's new wiki page, and the discussion so far has
clarified my thinking, and this message tries to express that new
clarity.  I put a conclusion at the end.


It has become clear that there are two elements to pretty much all the
proposals we have on the table.  Suppose we have two types, 'S' and 'T',
both with a field 'f', and you want to select field 'f' from a record 'r'.
Somehow you have to disambiguate which 'f' you mean.  

(Plan A) Disambiguate using qualified names.  To select field f, say
        (S.f r) or (T.f r) respectively.

(Plan B) Disambiguate using types. This approach usually implies dot-notation.
         If  (r::S), then (r.f) uses the 'f' from 'S', and similarly if (r::T).

Note that

* The Frege-derived records proposal (FDR), uses both (A) and (B)

* The Simple Overloaded Record Fields (SORF) proposal uses only (B)

* The Type Directed Name Resolution proposal (TDNR) uses only (B)

I know of no proposal that advocates only (A).  It seems that we are agreed
that we must make use of types to disambigute common cases.

Complexities of (Plan B)
Proposal (Plan B) sounds innocent enough.  But I promise you, it isn't.
There has ben some mention of the "left-to-right" bias of Frege type
inference engine; indeed the wohle explanation of which programs are
accepted and which are rejected, inherently involves an understanding
of the type inference algorithm.  This is a Very Bad Thing when the
type inference algorithm gets complicated, and GHC's is certainly

Here's an example:
   type family F a b
   data instance F Int [a] = Mk { f :: Int }

   g :: F Int b  -> ()
   h :: F a [Bool] -> ()

   k x = (g x, x.f, h x)

Consider type inference on k.  Initially we know nothing about the
type of x.  
 * From the application (g x) we learn that x's type has
   shape (F Int <something>).  
 * From the application (h x) we learn that x's type has
   shape (F <something else> [Bool])
 * Hence x's type must be (F Int [Bool])
 * And hence, using the data family we can see which field
   f is intended.

Notice that
 a) Neither left to right nor right to left would suffice
 b) There is significant interaction with type/data families
    (and I can give you more examples with classes and GADTs)
 c) In passing we note that it is totally unclear how (Plan A) 
    would deal with data families

This looks like a swamp.  In a simple Hindley-Milner typed language
you might get away with some informal heuristics, but Haskell is far
too complicated.  

Fortunately we know exactly what to do; it is described in some detail
in our paper "Modular type inference with local assumptions"

The trick is to *defer* all these decisions by generating *type constraints*
and solving them later.  We express it like this:

   G, r:t1  |-  r.f : t2,  (Has t1 "f" t2)

This says that if r is in scope with type t1, then (r.f) has type t2,
plus the constraint (Has t1 "f" t2), which we read as saying

   Type t1 must have a field "f" of type t2

We gather up all the constraints and solve them.  In solving them 
we may figure out t1 from some *other* constraint (to the left or
right, it's immaterial. That allow us to solve *this* constraint.

So it's all quite simple, uniform, and beautiful.  It'll fit right
into GHC's type-constraint solver.

But note what has happened: we have simply re-invented SORF.  So the
conclusion is this: 

   the only sensible way to implement FDR is using SORF.

What about overloading?
A feature of SORF is that you can write functions like this

   k :: Has r "f" Int => r -> Int
   k r = r.f + 1

Function 'k' works on any record that has a field 'f'.  This may be
cool, but it wasn't part of our original goal.  And indeed neither FDR
nor TDNR offer it.

But, the Has constraints MUST exist, in full glory, in the constraint
solver.  The only question is whether you can *abstract* over them.
Imagine having a Num class that you could not abstract over. So you 
could write

   k1 x = x + x :: Float
   k2 x = x + x :: Integer
   k3 x = x + x :: Int

using the same '+' every time, which generates a Num constraint. The
type signature fixes the type to Float, Integer, Int respectively, and
tells you which '+' to use.  And that is exactly what ML does!

But Haskell doesn't.  The Coolest Thing about Haskell is that you get
to *abstract* over those Num constraints, so you can write

  k :: Num a => a -> a
  k x = x + x

and now it works over *any* Num type.

On reflection, it would be absurd not to do ths same thing for Has
constraints.  If we are forced to have Has constraints internally, it
woudl be criminal not to abstract over them.  And that is precisely
what SORF is.

Is (Plan A) worth it?

Once you have (Plan B), and SORF in full glory, plus of course the
existing ability to name fields T_f, S_f, if you want, I think it is
highly questionable whether we need the additional complexities of
(Plan A)?  

And I do think (Plan A) has lots of additional complexities that we
have not begun to explore yet.  The data-family thing above is an
example, and I can think of some others.

But even if it was simple, we still have to ask: does *any* additional
complexity give enough payoff, if you already have SORF?  I suspect

Extensions to SORF
Frege lets you add "virtual fields" to a record type, using an extra
RExtension mechanism that I do not yet understand.  But SORF lets you
do so with no additional mechanism.  See "Virtual record selectors"
The point is that the existing type-class instance mechanisms do just
what we want.

The debate on the mailing list has moved sharply towards discussing
lexical syntax.  I'm not going to engage in that discussion because
while it is useful to air opinions, it's very hard to get agreement.
But for the record:

  * I don't mind having Unicode alternatives, but there must be
    ASCII syntax too

  * I think we must use ordinary dot for field selection.

  * I think it's fine to insist on no spaces; we are already 
    doing this for qualified names, as someone pointed out

  * I think we should not introduce new syntax unless we are
    absolutely forced into it.  Haskell's record update syntax
    isn't great, but we have it.

I am driven to the conclusion that SORF is the way to go.
 - Every other proposal on the table requires SORF (either
   visibly or invisibly)
 - When you have SORF, you don't really need anything else

The blocking issues are described on

  a) "Representation hiding" (look for that heading)

  b) "Record update" (ditto), most especially for records whose
     fields have polymorphic types

If we fix these we can move forward.

More information about the Glasgow-haskell-users mailing list