Records in Haskell

Mon Jan 9 23:22:31 CET 2012

You mean this wiki page, right?:
http://hackage.haskell.org/trac/ghc/wiki/Records/NameSpacing

> That is, there are no fundamental
> objections to the implementation of this records implementation.

I think that might be overly optimistic... I think there's a risk that 
SPJ finds an irritating complication to type inference & the rest of us 
aren't type-system-savvy enough to continue trying to guess at that :) 
But I think you're referring to whether we object to ad-hoc overloading 
of record field names (neither parametric nor class-based polymorphism), 
if no difficulties crop up.  Some of the concerns on 
http://www.haskell.org/haskellwiki/TypeDirectedNameResolution apply -- 
I'm not sure to what extent, but address those concerns rather than 
require those people to repeat themselves again! :)

(If we dodge all those obstacles, well, a better record system is better!)

Regardless, I think the proposal needs more precision, so I tried for 
syntax.  And got this thousand word email just about syntax, in several 
sections of reasoning, sorry!

--so here are my conclusions/suggestions up front in case you prefer, 
most complicated first - look later for details in a different order, 
referenced by [number].

Given that type inference for expr.field decides between several 
different possible meanings of "field", which "field"s in scope count as 
possibilities?  I concluded "All identifiers 'field' in top-level scope 
(i.e. declared in this module or imported unqualified from another 
module), but no local let/lambda bindings." [1]

I have an unrelated record/module system tweak suggestion to prevent 
leaks of the field-ness of exports meant only as functions. [2]

".field", for any identifier "field", is a postfix operator that binds 
more tightly than function application. [1]

I don't care whether the expression and ".field" are permitted to be 
separated by whitespace or not. [4]

"(.field)" as a section: should it always require parentheses? (assuming 
there is a way to type-inference the unapplied dot operator usefully at 
all). [3]

The Type.{...} / expr.{...} variations look probably fine ("Syntax for 
updates" in the wiki). [5]

Allow "TyCon.field" for selecting a specific version of "field", but 
only if there isn't a module in scope as "TyCon" in which case it's 
interpreted as "Module.field". [7]

Allow "expr.TyCon.field" / "expr.Module.field" wherever "expr.field" is 
allowed. [8]

I bikesheddily suggest "field at expr" rather than "expr.field" (I'd be 
alright with either/both). [6]

===== Defining the basic dot syntax for record field selection =====

[1]
.x, for any identifier x, is a postfix operator that binds more tightly 
than function application.  This operator does not constrain the type of 
its operand or its result.  After regular type inference, the type 
system collects in-scope "x"s and does something to try to choose one.

(As a non-type-system-expert, I'm not addressing what "does something" 
means in this email, besides capitalizing TYPE INFERENCE where I suspect 
there might (or might not) be extra problems.  Anyway, I wish the 
algorithm do the right thing for "expr.x.y.z".)

But which in-scope "x"es does it collect?  Regular Haskell scope would 
completely break "let x = var.x" if we wished to allow "let x = var.x".

How about: all record fields 'x' declared in this module or imported 
(qualified[??] or unqualified) from another module.

[[[
Should qualified ones be allowed?
Pro: if you import Data.Map qualified, as is common (let's just pretend 
that Maps have members), then someMap.field works.

Pro: it's like allowing qualified imports for instance declaration 
class-member-definitions.

Con: it's not really like that. It makes qualified imports a weaker 
protection, as the class/instance exception can lead to no ambiguity, 
but this can lead to ambiguity.  The PVP would make a sad face.

Con: using unqualified import with (..) would easily bring the field 
names into scope.  Fictitiously, "import qualified Data.Map as Map; 
import Data.Map(Map(..))".

Observation: allowing qualified imports, but not following the 
class/instance system's style of including everything in the transitive 
closure of imported modules, still prevents you (Pro) from breaking 
intentional abstraction barriers, but (Con?) requires you to import the 
operators for types you receive but don't import.

Opinion: only unqualified imports should be part of the selection process.
]]]

[[[
Problem: Restricting the selection to only record fields further 
compromises an existing imperfect property of Haskell:
module Library (Type, constructor, deconstructor) where
data Type = Constructor { deconstructor :: Int }
-- let's pretend it's a bounds-limited int or such.
constructor int | int >= 3 && int < 17 = Constructor int

Currently, importers of the module can observe that 'deconstructor' is a 
record-field by importing Library(Type(..)) and getting 'deconstructor' 
(see [2]--can we change that.).  This makes it slightly harder for the 
library implementer to change that name to a non-record-field.  In the 
proposal, the users might also have gotten used to "expr.deconstructor", 
and there would be no way to replace that syntax.

Possible fix: also require all the type's data-constructors to be in 
scope.  I think that's too big a hammer though.  We could punt.  We 
could change the selection to "all top-level names 'x' declared in this 
module or imported unqualified from another module."

Opinion: "all top-level names 'x' declared in this module or imported 
unqualified from another module." is better and not worse than 
restricting it to record-fields (provided that it does not burden the 
type inferencer complexity).

Problem: Given that, it's annoying that you can't bind a 
record-field-ish-selector in a let/lambda-binding.
Possible solution: allow "let .deconstruct = \x -> x - 100", possibly 
with sugary variations.  Possibly require a type-signature.
Possibility: also allow ".deconstruct = " at top level.  (If the dot 
notation does funny things with TYPE INFERENCE, this might be a dubious 
idea.)
Possible solution: consider *both* local bindings *and* the top-level 
names that they would normally shadow.  (That sounds rather odd; it 
might work since most of the local bindings with those names will be 
non-functions and thus not eligible; is it worth it?)
Opinion: Just let it be annoying that you can't bind a 
record-field-ish-selector in a let/lambda-binding.  (You can't declare 
data or classes in let bindings either.  Any of these can be improved 
but I don't think we need to just to have a record system.)

[2]
We could make this leak fixable thus:

"module Library (Type, deconstructor)"
does not make "import Library(Type(..))" import "deconstructor"
but
"module Library (Type(deconstructor))" or "module Library (Type(..))"
do make "import Library(Type(..))" import "deconstructor" (and make 
"import Library(Type(deconstructor))" valid).

The difference could even extend to not making "deconstructor" usable in 
any record syntax (construction, pattern matching, and record update) 
unless it's imported from somewhere that it's exported syntactically 
under its type.

This might make existing code break.
Does anyone think this change might be a good idea?
]]]

Suggestion: select all identifiers 'x' in top-level scope (i.e. declared 
in this module or imported unqualified from another module).

===== Miscellaneous dot-syntax observations =====

[3]
Observation: The point-free thought "b . .a" would be exactly the same 
as the "b . a" we have now with record fields -- except that it behaves 
a bit differently regarding scope/TYPE INFERENCE.  It implies that (.a) 
is a section of the unary postfix record-field-selection operator 
(similar to (1 +) being an operator section).  "b . .a" also suggests 
not requiring parentheses for that section when parsing precedence would 
not require it... for example, parsing precedence would require it in 
"map (.a) list" if "map .a list" meant "(map.a) list".

[4]
Should "identifier .field" be disallowed because it's almost certainly a 
mistake?  But
"
         (some long expression here)
                  .field
"
probably isn't a mistake, so, shrugs.  I think it would be equally 
plausible to require the non-section version of dot to have no spaces on 
either side, or only require no-space on the right hand side of the dot.

[5]
The "Syntax for updates" from Frege in the wiki looks fine and 
syntactically unambiguous to me ("identifier.{" without spaces) - I'd 
want to think about it later but it seems unlikely to me to go terribly 
wrong.  When you say "If a::T then a.{x=} and a.{x=42} are valid", okay 
they're valid but as what? as "T.{x=} a" and "T.{x=42} a", or without 
the "a" argument? e.g. say
"If a::T then a.{x=} and a.{x=42} are equivalent to (T.{x=} a) and 
(T.{x=42} a)".  Does TYPE INFERENCE have any more trouble with those 
than with regular field selection (I don't know)?

[6]
If we want to bikeshed about what the operator should be ("." or other) :
As 
http://hackage.haskell.org/trac/haskell-prime/wiki/TypeDirectedNameResolution 
says about using dot, "It's standard practice, and that counts for a lot."
But if we want to bikeshed and look for an operator that orders things 
the same way as function application and composition...
record.field
vs
field at record
.
"@" is already reserved.  It could bind more tightly than function 
application and type-inference differently.  Because it's reserved, it 
doesn't need to require no-spaces-around-it.
Here's example from that TDNR page:
typical OO: "x.f(3).g(v,w).h"
how it would have to be in Haskell+TDNR-dot-syntax: "((x.f 3).g v w).h"
or with @ instead of .: "h@(g@(f at x 3) v w)"
Which is IMHO only minorly better... but then again, the Frege-ish 
record proposal might only be doing TDNR for records specifically.
(more syntax and semantic discussion at 
http://www.haskell.org/haskellwiki/TypeDirectedNameResolution ).

...I kind of like @, but think it's a bikeshed: I claim that, if we work 
out the semantics (big "if"!), that not enough of us are going to say 
"it's better to have no records than dot-syntax records" [ditto for 
every other syntax] that the best decision would be "no records!".  If 
this becomes a popular bikeshed, we might just try straw-polling and 
picking what's popular (rather than get distracted and waste a year -- a 
common discussion outcome!).  (Though, feelings might be stronger than 
most bikesheds, for such a core language + syntax change...hmm.)

===== Module-related stuff =====

It would be permitted to declare two records in the same scope with the 
same record field name.

GHC would not warn if you shadow record field names with 
lambda/let-bound variables (It currently does, quite reasonably, warn, 
iff you enabled -fwarn-name-shadowing && not -XNamedFieldPuns.)  This, 
along with all the other changes, would be contingent on a new flag like 
-XNamedFieldDots.

[7]
We would add a syntax "TyCon.name".

Thorough option for TyCon.name:
(1) If there is a module in scope as TyCon containing a type TyCon with 
a field "name", it would choose that field.
(2) Else if there is a module import in scope as TyCon containing an 
unqualified, unambiguous* identifier "name", it would choose that. 
*["unambiguous" meaning it doesn't have two different records with 
exported fields named "name".]
(3) Else if there is a type-constructor in scope TyCon with a 
named-field "name", it would choose that.
(4) Else it would be an error.

I think rule (1) can be deleted without changing anything.

People usually use module names with dots in them, and type-constructors 
cannot have dots in them.  If a single module decides to import one 
module "as" the exact name of a type imported from an entirely different 
module, and the two happen to have some of the same identifiers, perhaps 
it's okay for silliness to ensue.  In fact, given the unlikeliness of 
inconsistent overlaps like that, I suspect that:

Simple option:
(1) if there's a module in scope of that name, it means module scope 
even if that means the lookup fails
(2) if there's not a module, see if it can be a type name

is equally fine and better because it's simpler.  IMHO we shouldn't put 
these two rules in the other order because it has the ability to break 
existing code only for the benefit of something that hardly matters 
either way at all.

None of these changes can break existing code.  The only breaking change 
that "-XNamedFieldDots" would introduce is a different meaning of a dot 
followed without spaces by a lowercase letter.

[8]
I suggest we should allow expr.TyCon.field (and expr.Module.field I 
guess) for field-selection too.  It's irritating when a syntax can't be 
qualified without rearranging things (to e.g. "(TyCon.field expr)" or 
e.g. "(expr::TyCon Int).field" [for single-parameter TyCons like Maybe]).

There is no specific interaction with type-classes because Haskell 
type-classes do not behave like Frege type-classes (as best I can tell 
from this discussion).