Keep the present Haskell record system!

Mon Mar 6 15:05:48 EST 2006

On 06/03/06, Johannes Waldmann <waldmann at imn.htwk-leipzig.de> wrote:
> With respect to the discussion on records, let me throw in
> my usual warning: all of this seems overly obsessed
> with concrete representations of data types.

We already have mechanisms for abstraction. There's a gap in our
ability to form certain concrete representations we might want. This
paper simply describes how to add those representations to the
language in a nice way.

> The representation should not be exposed in the first place:
> you don't want to access it (=> make all fields private)
> you don't want to extend it (=> implementation inheritance is bad,
> interface inheritance is good.) (read e. g. Introduction to
> Design Patterns by Gamma et al.)
>
> You think it is a win to be able to write a function
> that takes "everything that has a foo :: Foo component"?
> I think it is not, since it is not robust design.
> It will only take records, and components have to be components.
> What if you later change the type's representation from a record to
> something else? If you change the component to a function?
> If you want a reliable notion of "everything that has a foo :: Foo",
> then you need to declare an interface (erm, one parameter type class).

Well, changing data representations is always inflexible. This isn't a
new problem, and as you mentioned, you can still fix it with the use
of typeclasses.

> My point is that the OO community has learned all this stuff the hard
> way (from software problems arising from naive use of objects and
> inheritance), and it has taken them years, if not decades, and now it
> looks as if we are going to joyfully repeat this whole process.

Large product types normally indicate an awkward design, yes, but
they're still implicit in many real-world interfaces, and it can be
quite difficult to deal with them. This gives nice ways to break them
up and work with them where they naturally occur.

> An important selling point of the records proposal seems to be
> that you don't have to declare a type name for a record type.
> While I don't buy this whole idea  (we have a declarative programming
> language but we want to avoid (type) declarations?) I see a concrete
> problem: what if you want to make such a nameless type an instance
> of some type class? Then we get all sorts of overlappings.

Well, I don't know about that. You don't have to declare a type name
simply because all the types here already exist. You can still newtype
them. However, not all record types are polymorphic. Declaring
instances for completely specified rows would not be an issue. It's
not clear to me that having instances for polymorphic records would be
too much of an issue either. Yes, it would be easy to get overlaps,
but not much more so than with existing polymorphic types. If there's
more than one polymorphic instance, then of course you get overlap,
because you can construct a record type with the union of the labels
from the two instances. However are multiple row-polymorphic instances
even needed? Due to the problem that records could very well satisfy
both predicates in any situation like that, if you needed multiple
instances, it would be better to newtype as usual.

> So with respect to the original post (see the subject of this email)
> I tend to agree: leave records as they are. Of course they are
> problematic, but the main reason is not missing extensibility.

Well, the issue is just that Haskell does not actually have a record
system. It has algebraic types, and while those can emulate certain
aspects of records, they are not the same thing. The current "record
syntax" is just syntax sugar for labelling the fields of a product in
an algebraic type. It's nice syntax sugar, and I wouldn't want to get
rid of it. (Though it could perhaps do with a renaming :)

> As I see it, the problem is that the named component notation was added
> late and still allows to access the earlier positional notation,
> and the component names are in the (module-global) namespace.

The problem is that people see "record syntax" and think that somehow
what they're declaring is any different from an ordinary product. The
syntax gives you a little more capacity for dealing with more fields,
and a little bit of future proofing, but not much more, and really
it's the same thing, as the ability to use the positional notation
indicates. Even with syntax sugar, using large product types in
current Haskell is poor design. I'll illustrate one of the main
reasons for this, and how extensible records can help fix that
problem:

Suppose that A, B, and C are types and that we have:
data T = T {x :: A, y :: B, z :: C}
which we're trying to use to simulate a record type.

Then any function
f :: T -> T
has the ability to read and depend on all the components of the T
which it is working with. There are many cases where this is
completely inappropriate, but restricting access to one or more of the
components is difficult. We'd need to define a new typeclass with
get/set functions, and use that instead. Doing this sort of thing for
every one of the fields of every product one uses is obviously not a
good solution.

On the other hand, a function:
f :: {x :: A, y :: B | r} -> {x :: A, y :: B | r}
obviously can only depend and act on the x and y components, and is
not allowed to touch z at all. Sure, you might perhaps say that
there's too much polymorphism there, but this usually isn't an issue,
and there are still newtypes to tag things and ensure that they don't
get into the wrong parts of the program. Record types would also be
permitted as members of algebraic data types.

More flexible systems than just using products as records are possible
using typeclasses with label types like HList, but these generally
involve quite a lot of typeclass hackery which, while it's nice to see
that it can be done, at some point begins to feel like an abuse of the
system, when one could do a better treatment at the compiler level.
Such systems still wouldn't have properties as nice as the record
system in the paper. (There is no provision for associative or
commutative data/type constructors.) A related issue is that these
tend to be closer in performance to association lists, which means
that while extension is fast, record selection is linear time.

> This would be more tolerable if we had ad-hoc overloading.
> Since we haven't, I'm now basically putting each data declaration
> in a separate module and import these qualified.
> (This simulates the "per-type" namespace for components.)

I think that ad-hoc overloading would be much more intolerable. In
some cases the design you describe (separating a data type into a
module) is appropriate, but I wouldn't hold myself to it. Usually I'd
only use that if I planned to hide the constructors. A lot of the
time, field labels can be renamed such that they don't overlap.
Inventing new names is not hard work. (You can just put part or all of
the type name in the labels, and you get basically the same effect as
the module system gives you.)

 - Cale