Records in Haskell
Isaac Dupree
ml at isaac.cedarswampstudios.org
Mon Jan 9 23:22:31 CET 2012
You mean this wiki page, right?:
http://hackage.haskell.org/trac/ghc/wiki/Records/NameSpacing
> That is, there are no fundamental
> objections to the implementation of this records implementation.
I think that might be overly optimistic... I think there's a risk that
SPJ finds an irritating complication to type inference & the rest of us
aren't type-system-savvy enough to continue trying to guess at that :)
But I think you're referring to whether we object to ad-hoc overloading
of record field names (neither parametric nor class-based polymorphism),
if no difficulties crop up. Some of the concerns on
http://www.haskell.org/haskellwiki/TypeDirectedNameResolution apply --
I'm not sure to what extent, but address those concerns rather than
require those people to repeat themselves again! :)
(If we dodge all those obstacles, well, a better record system is better!)
Regardless, I think the proposal needs more precision, so I tried for
syntax. And got this thousand word email just about syntax, in several
sections of reasoning, sorry!
--so here are my conclusions/suggestions up front in case you prefer,
most complicated first - look later for details in a different order,
referenced by [number].
Given that type inference for expr.field decides between several
different possible meanings of "field", which "field"s in scope count as
possibilities? I concluded "All identifiers 'field' in top-level scope
(i.e. declared in this module or imported unqualified from another
module), but no local let/lambda bindings." [1]
I have an unrelated record/module system tweak suggestion to prevent
leaks of the field-ness of exports meant only as functions. [2]
".field", for any identifier "field", is a postfix operator that binds
more tightly than function application. [1]
I don't care whether the expression and ".field" are permitted to be
separated by whitespace or not. [4]
"(.field)" as a section: should it always require parentheses? (assuming
there is a way to type-inference the unapplied dot operator usefully at
all). [3]
The Type.{...} / expr.{...} variations look probably fine ("Syntax for
updates" in the wiki). [5]
Allow "TyCon.field" for selecting a specific version of "field", but
only if there isn't a module in scope as "TyCon" in which case it's
interpreted as "Module.field". [7]
Allow "expr.TyCon.field" / "expr.Module.field" wherever "expr.field" is
allowed. [8]
I bikesheddily suggest "field at expr" rather than "expr.field" (I'd be
alright with either/both). [6]
===== Defining the basic dot syntax for record field selection =====
[1]
.x, for any identifier x, is a postfix operator that binds more tightly
than function application. This operator does not constrain the type of
its operand or its result. After regular type inference, the type
system collects in-scope "x"s and does something to try to choose one.
(As a non-type-system-expert, I'm not addressing what "does something"
means in this email, besides capitalizing TYPE INFERENCE where I suspect
there might (or might not) be extra problems. Anyway, I wish the
algorithm do the right thing for "expr.x.y.z".)
But which in-scope "x"es does it collect? Regular Haskell scope would
completely break "let x = var.x" if we wished to allow "let x = var.x".
How about: all record fields 'x' declared in this module or imported
(qualified[??] or unqualified) from another module.
[[[
Should qualified ones be allowed?
Pro: if you import Data.Map qualified, as is common (let's just pretend
that Maps have members), then someMap.field works.
Pro: it's like allowing qualified imports for instance declaration
class-member-definitions.
Con: it's not really like that. It makes qualified imports a weaker
protection, as the class/instance exception can lead to no ambiguity,
but this can lead to ambiguity. The PVP would make a sad face.
Con: using unqualified import with (..) would easily bring the field
names into scope. Fictitiously, "import qualified Data.Map as Map;
import Data.Map(Map(..))".
Observation: allowing qualified imports, but not following the
class/instance system's style of including everything in the transitive
closure of imported modules, still prevents you (Pro) from breaking
intentional abstraction barriers, but (Con?) requires you to import the
operators for types you receive but don't import.
Opinion: only unqualified imports should be part of the selection process.
]]]
[[[
Problem: Restricting the selection to only record fields further
compromises an existing imperfect property of Haskell:
module Library (Type, constructor, deconstructor) where
data Type = Constructor { deconstructor :: Int }
-- let's pretend it's a bounds-limited int or such.
constructor int | int >= 3 && int < 17 = Constructor int
Currently, importers of the module can observe that 'deconstructor' is a
record-field by importing Library(Type(..)) and getting 'deconstructor'
(see [2]--can we change that.). This makes it slightly harder for the
library implementer to change that name to a non-record-field. In the
proposal, the users might also have gotten used to "expr.deconstructor",
and there would be no way to replace that syntax.
Possible fix: also require all the type's data-constructors to be in
scope. I think that's too big a hammer though. We could punt. We
could change the selection to "all top-level names 'x' declared in this
module or imported unqualified from another module."
Opinion: "all top-level names 'x' declared in this module or imported
unqualified from another module." is better and not worse than
restricting it to record-fields (provided that it does not burden the
type inferencer complexity).
Problem: Given that, it's annoying that you can't bind a
record-field-ish-selector in a let/lambda-binding.
Possible solution: allow "let .deconstruct = \x -> x - 100", possibly
with sugary variations. Possibly require a type-signature.
Possibility: also allow ".deconstruct = " at top level. (If the dot
notation does funny things with TYPE INFERENCE, this might be a dubious
idea.)
Possible solution: consider *both* local bindings *and* the top-level
names that they would normally shadow. (That sounds rather odd; it
might work since most of the local bindings with those names will be
non-functions and thus not eligible; is it worth it?)
Opinion: Just let it be annoying that you can't bind a
record-field-ish-selector in a let/lambda-binding. (You can't declare
data or classes in let bindings either. Any of these can be improved
but I don't think we need to just to have a record system.)
[2]
We could make this leak fixable thus:
"module Library (Type, deconstructor)"
does not make "import Library(Type(..))" import "deconstructor"
but
"module Library (Type(deconstructor))" or "module Library (Type(..))"
do make "import Library(Type(..))" import "deconstructor" (and make
"import Library(Type(deconstructor))" valid).
The difference could even extend to not making "deconstructor" usable in
any record syntax (construction, pattern matching, and record update)
unless it's imported from somewhere that it's exported syntactically
under its type.
This might make existing code break.
Does anyone think this change might be a good idea?
]]]
Suggestion: select all identifiers 'x' in top-level scope (i.e. declared
in this module or imported unqualified from another module).
===== Miscellaneous dot-syntax observations =====
[3]
Observation: The point-free thought "b . .a" would be exactly the same
as the "b . a" we have now with record fields -- except that it behaves
a bit differently regarding scope/TYPE INFERENCE. It implies that (.a)
is a section of the unary postfix record-field-selection operator
(similar to (1 +) being an operator section). "b . .a" also suggests
not requiring parentheses for that section when parsing precedence would
not require it... for example, parsing precedence would require it in
"map (.a) list" if "map .a list" meant "(map.a) list".
[4]
Should "identifier .field" be disallowed because it's almost certainly a
mistake? But
"
(some long expression here)
.field
"
probably isn't a mistake, so, shrugs. I think it would be equally
plausible to require the non-section version of dot to have no spaces on
either side, or only require no-space on the right hand side of the dot.
[5]
The "Syntax for updates" from Frege in the wiki looks fine and
syntactically unambiguous to me ("identifier.{" without spaces) - I'd
want to think about it later but it seems unlikely to me to go terribly
wrong. When you say "If a::T then a.{x=} and a.{x=42} are valid", okay
they're valid but as what? as "T.{x=} a" and "T.{x=42} a", or without
the "a" argument? e.g. say
"If a::T then a.{x=} and a.{x=42} are equivalent to (T.{x=} a) and
(T.{x=42} a)". Does TYPE INFERENCE have any more trouble with those
than with regular field selection (I don't know)?
[6]
If we want to bikeshed about what the operator should be ("." or other) :
As
http://hackage.haskell.org/trac/haskell-prime/wiki/TypeDirectedNameResolution
says about using dot, "It's standard practice, and that counts for a lot."
But if we want to bikeshed and look for an operator that orders things
the same way as function application and composition...
record.field
vs
field at record
.
"@" is already reserved. It could bind more tightly than function
application and type-inference differently. Because it's reserved, it
doesn't need to require no-spaces-around-it.
Here's example from that TDNR page:
typical OO: "x.f(3).g(v,w).h"
how it would have to be in Haskell+TDNR-dot-syntax: "((x.f 3).g v w).h"
or with @ instead of .: "h@(g@(f at x 3) v w)"
Which is IMHO only minorly better... but then again, the Frege-ish
record proposal might only be doing TDNR for records specifically.
(more syntax and semantic discussion at
http://www.haskell.org/haskellwiki/TypeDirectedNameResolution ).
...I kind of like @, but think it's a bikeshed: I claim that, if we work
out the semantics (big "if"!), that not enough of us are going to say
"it's better to have no records than dot-syntax records" [ditto for
every other syntax] that the best decision would be "no records!". If
this becomes a popular bikeshed, we might just try straw-polling and
picking what's popular (rather than get distracted and waste a year -- a
common discussion outcome!). (Though, feelings might be stronger than
most bikesheds, for such a core language + syntax change...hmm.)
===== Module-related stuff =====
It would be permitted to declare two records in the same scope with the
same record field name.
GHC would not warn if you shadow record field names with
lambda/let-bound variables (It currently does, quite reasonably, warn,
iff you enabled -fwarn-name-shadowing && not -XNamedFieldPuns.) This,
along with all the other changes, would be contingent on a new flag like
-XNamedFieldDots.
[7]
We would add a syntax "TyCon.name".
Thorough option for TyCon.name:
(1) If there is a module in scope as TyCon containing a type TyCon with
a field "name", it would choose that field.
(2) Else if there is a module import in scope as TyCon containing an
unqualified, unambiguous* identifier "name", it would choose that.
*["unambiguous" meaning it doesn't have two different records with
exported fields named "name".]
(3) Else if there is a type-constructor in scope TyCon with a
named-field "name", it would choose that.
(4) Else it would be an error.
I think rule (1) can be deleted without changing anything.
People usually use module names with dots in them, and type-constructors
cannot have dots in them. If a single module decides to import one
module "as" the exact name of a type imported from an entirely different
module, and the two happen to have some of the same identifiers, perhaps
it's okay for silliness to ensue. In fact, given the unlikeliness of
inconsistent overlaps like that, I suspect that:
Simple option:
(1) if there's a module in scope of that name, it means module scope
even if that means the lookup fails
(2) if there's not a module, see if it can be a type name
is equally fine and better because it's simpler. IMHO we shouldn't put
these two rules in the other order because it has the ability to break
existing code only for the benefit of something that hardly matters
either way at all.
None of these changes can break existing code. The only breaking change
that "-XNamedFieldDots" would introduce is a different meaning of a dot
followed without spaces by a lowercase letter.
[8]
I suggest we should allow expr.TyCon.field (and expr.Module.field I
guess) for field-selection too. It's irritating when a syntax can't be
qualified without rearranging things (to e.g. "(TyCon.field expr)" or
e.g. "(expr::TyCon Int).field" [for single-parameter TyCons like Maybe]).
There is no specific interaction with type-classes because Haskell
type-classes do not behave like Frege type-classes (as best I can tell
from this discussion).
More information about the Glasgow-haskell-users
mailing list