[Haskell-cafe] A backwards-compatible record proposal
Brian Hulley
brianh at metamilk.com
Sat Aug 19 04:21:34 EDT 2006
Hi -
As I've been writing a Haskell program over the past few months the main
problem I encounter is that record field names are not local to the record
type, and any systematic way of making them local (eg by prepending
"_Tycon_") results in names that are just too clunky, and I feel that
identifiers *must* use systematic naming conventions to get code that will
be easy to understand and maintain.
Although it is relatively easy to think up a better record system, it has
taken me up till now to discover how such a system could be integrated with
Haskell as it is at the moment, since it's unlikely that the existing record
system will ever disappear, at least not in the next few years, and in fact,
there are some good points about the existing record system that I wouldn't
like to lose.
I'll motivate the proposal then talk about how it could be implemented.
Motivation
=======
Consider the following:
data Vector3 a = Vector3{x :: a, y::a, z::a }
data Normal3 a = Normal3{x :: a, y::a, z::a }
We've got a problem because (x) has been introduced twice to the top level
namespace. In the above example, it could be argued that I should have
written:
data Arr3 a = Arr3 {x::a, y::a, z::a}
newtype Vector3 a = Vector3 (Arr3 a)
newtype Normal3 a = Normal3 (Arr3 a)
but for the sake of argument, let's suppose we can't do this - perhaps one
of the record types has some different fields as well.
A related problem is suppose I have:
data Size = {width :: Int, height :: Int}
data Rect = Rect{x1, y1, x2, y2::Int}
width :: Rect -> Int
width Rect{x1,_,x2,_} = x2 - x1
because a record field of Size has the same name as a top level function.
This second conflict can be avoided by always using the rule that record
fields begin with an underscore and all other variables don't.
So applying this to the first problem, we have:
data Vector3 a = Vector3{_x :: a, _y::a, _z::a }
data Normal3 a = Normal3{_x :: a, _y::a, _z::a }
Of course we haven't solved it yet!
But now suppose we introduce a new piece of syntactic sugar, and write:
data Vector3 a = Vector3{.x :: a, .y::a, .z::a }
data Normal3 a = Normal3{.x :: a, .y::a, .z::a }
ie putting a '.' before each field name. The intended meaning is that dotted
field names do *not* generate top level functions. Instead they allow the
compiler to generate instance decls as follows, where we've introduced a new
form of identifier, the dotted id, which behaves as a postfix operator which
binds more tightly than function application and can also be used as a class
name (by the compiler only):
class (.x) :: a b | a -> b where
(.x) :: a -> b
class (.y) :: a b | a -> b where
(.y) :: a -> b
class (.z) :: a b | a -> b where
(.z) :: a -> b
For each dotted id, there is a class defined as above, which is available
globally to the whole program as if a module containing an infinite set of
class decls as above was exported by the Prelude.
In the module containing the data decl for the record, the compiler inserts
the following:
instance (.x) (Vector3 a) a where
(.x) v = ... -- compiler generated code to access the field
Then within the rest of the program we can write:
magSquared :: Num a => Vector3 a -> a
magSquared v = v.x*v.x + v.y*v.y + v.z*v.z
-- explicit type when specific function is required
vec_x = ((.x) :: (Vector3 a -> a))
The advantage of this proposal is that it is completely backwards compatible
with records as they are at the moment, and we can choose which fields we
want to be dotted and which we want to just keep as normal top level
functions.
The only extra thing we need to do is put a dot before the field names we
want to access via the dotted syntax, but we in any case needed to use an
underscore when we wanted a systematic way to avoid conflicts between field
names and other top-level names so there is no extra effort involved.
Implementation
==========
The above could almost be implemented just by parsing a source file
containing uses of dotted fields and using a conversion like:
data Rec a = Rec { .f :: a}
let
rec = Rec {.f = 78} -- dot is used here too
p = rec.f * rec.f
==>
data Rec a = Rec a
instance Dot__f (Rec a) a where
__dot_f (Rec a) = a
let
rec = Rec 78
p = __dot_f rec * __dot_f rec
So far so good, but the alert reader :-) will have noticed that we now have
a *major* problem with abstraction because although we can write:
module M (Rec) where ...
code in another module can still say rec.f because in Haskell, all instance
decls in a module are always exported [1]. This problem may disappear in
Haskell' [2]. http://hackage.haskell.org/trac/haskell-prime/ticket/19
Therefore I think the desugaring would need to take place in the compiler so
the compiler could avoid exporting the compiler-generated instances when the
fields are not present in the module export list.
Regards, Brian.
[1] http://haskell.org/onlinereport/modules.html#import-instances
[2] http://hackage.haskell.org/trac/haskell-prime/ticket/19
--
Logic empowers us and Love gives us purpose.
Yet still phantoms restless for eras long past,
congealed in the present in unthought forms,
strive mightily unseen to destroy us.
http://www.metamilk.com
More information about the Haskell-Cafe
mailing list