Records in Haskell

Sun Jan 8 15:22:31 CET 2012

2012/1/8 Greg Weber <greg at gregweber.info>:
>
>
> 2012/1/8 Gábor Lehel <illissius at gmail.com>
>>
>> >
>> >>
>> >>
>> >>
>> >> Later on you write that the names of record fields are only accessible
>> >> from the record's namespace and via record syntax, but not from the
>> >> global scope. For Haskell I think it would make sense to reverse this
>> >> decision. On the one hand, it would keep backwards compatibility; on
>> >> the other hand, Haskell code is already written to avoid name clashes
>> >> between record fields, so it wouldn't introduce new problems. Large
>> >> gain, little pain. You could use the global-namespace function as you
>> >> do now, at the risk of ambiguity, or you could use the new record
>> >> syntax and avoid it. (If you were to also allow x.n syntax for
>> >> arbitrary functions, this could lead to ambiguity again... you could
>> >> solve it by preferring a record field belonging to the inferred type
>> >> over a function if both are available, but (at least in my current
>> >> state of ignorance) I would prefer to just not allow x.n for anything
>> >> other than record fields.)
>> >
>> >
>> > Perhaps you can give some example code for what you have in mind - we do
>> > need to figure out the preferred technique for interacting with
>> > old-style
>> > records. Keep in mind that for new records the entire point is that they
>> > must be name-spaced. A module could certainly export top-level functions
>> > equivalent to how records work now (we could have a helper that
>> > generates
>> > those functions).
>>
>> Let's say you have a record.
>>
>> data Record = Record { field :: String }
>>
>> In existing Haskell, you refer to the accessor function as 'field' and
>> to the contents of the field as 'field r', where 'r' is a value of
>> type Record. With your proposal, you refer to the accessor function as
>> 'Record.field' and to the contents of the field as either
>> 'Record.field r' or 'r.field'. The point is that I see no conflict or
>> drawback in allowing all of these at the same time. Writing 'field' or
>> 'field r' would work exactly as it already does, and be ambiguous if
>> there is more than one record field with the same name in scope. In
>> practice, existing code is already written to avoid this ambiguity so
>> it would continue to work. Or you could write 'Record.field r' or
>> 'r.field', which would work as the proposal describes and remove the
>> ambiguity, and work even in the presence of multiple record fields
>> with the same name in scope.
>>
>> The point is that I see what you gain by allowing record fields to be
>> referred to in a namespaced way, but I don't see what you gain by not
>> allowing them to be referred to in a non-namespaced way. In theory you
>> wouldn't care because the non-namespaced way is inferior anyways, but
>> in practice because all existing Haskell code does it that way, it's
>> significant.
>
>
> My motivation for this entire change is simply to be able to use two record
> with field members of the same name. This requires *not* generating
> top-level functions to access record fields. I don't know if there is a
> valid use case for the old top-level functions once switched over to the new
> record system (other than your stated personal preference). We could
> certainly have a pragma or something similar that generates top-level
> functions even if the new record system is in use.

Oh, in a sense you're right. If the top-level accessor functions are
treated as if they were defined by the module containing the record,
and there is more than one with the same name, the compiler would see
it as multiple definitions and indeed report an error. On the other
hand if they are treated as imported names (conceptually, implicitly
imported from the namespace of the record, say), then the compiler
would only report an error when you actually try to use the ambiguous
name. I had been assuming the latter case without realizing it. It
corresponds to what you have now if you have multiple records imported
with overlapping field names.

Again, exporting the field accessors to global scope and deferring any
errors from ambiguity or overlap to the point of their use would not
in any way interfere with the use of those same field accessors with
the namespaced syntax. If you only use the namespaced syntax, it would
work exactly as in your proposal: the top-level accessors are never
used so no ambiguity errors are reported. If you only use the
top-level syntax, then it works almost exactly as Haskell currently
does (except you can define multiple records with overlapping field
names in the same module as long as you don't use them, which I had
not considered). The set of well-formed programs if you allow
top-level access would be almost a superset of the set of well-formed
programs if you don't. (The exception is that top-level field
accessors would conflict with non-accessor plain old functions of the
same name, whereas if they weren't visible outside of the record's
namespace they wouldn't, but I don't feel like that's a huge concern.)

>
>>
>> >
>> >>
>> >> All of that said, maybe having TDNR with bad syntax is preferable to
>> >> not having TDNR at all. Can't it be extended to the existing syntax
>> >> (of function application)? Or least some better one, which is ideally
>> >> right-to-left? I don't really know the technical details...
>> >>
>> >> Generalized data-namespaces: Also think I'm opposed. This would import
>> >> the problem from OO languages where functions written by the module
>> >> (class) author get to have a distinguished syntax (be inside the
>> >> namespace) over functions by anyone else (which don't).
>> >
>> >
>> > Maybe you can show some example code? To me this is about controlling
>> > exports of namespaces, which is already possible - I think this is
>> > mostly a
>> > matter of convenience.
>>
>> If I'm understanding correctly, you're suggesting we be able to write:
>>
>> data Data = Data Int where
>>    twice (Data d) = 2 * d
>>    thrice (Data d) = 3 * d
>>    ...
>>
>> and that if we write 'let x = Data 7 in x.thrice' it would evaluate to
>> 21. I have two objections.
>>
>> The first is the same as with the TDNR proposal: you would have both
>> code that looks like
>> 'data.firstFunction.secondFunction.thirdFunction', as well as the
>> existing 'thirdFunction $ secondFunction $ firstFunction data' and
>> 'thirdFunction . secondFunction . firstFunction $ data', and if you
>> have both of them in the same expression (which you will) it becomes
>> unpleasant to read because you have to read them in opposite
>> directions.
>
>
> This would not be possible because the functions can only be accessed from
> the namespace - you could only use the dot (or T.firstFunction). It is
> possible as per your complaint below:

Sorry, I was unclear here. The firstFunction, secondFunction, and
thirdFunction in my examples are *not* referring to the very same
firstFunction, secondFunction, and thirdFunction, they are all
placeholders for arbitrary functions.

My problem is that you could (and would have to, because the syntaxes
aren't interchangeable) write things like this:

foo . bar . (baz.quux.asdf) . wasd $ hjkl

Now what's the right order for reading the functions in this
expression? The correct answer is:

hjkl wasd baz quux asdf bar foo

or using numbers to denote their place:

7 6 3 4 5 2 1

If you had written the equivalent using existing Haskell syntax it would be:

foo . bar . (asdf $ quux baz) . wasd $ hjkl

and the right order for reading it is:

hjkl wasd baz quux asdf bar foo

or with numbers:

7 6 5 4 3 2 1

If you introduce heavy use of the a.b.c.d syntax you would frequenty
have to jump around and switch directions while you read an
expression. If you restrict it to only field accessors I think it
would be limited and tolerable, my quarrel is with allowing arbitrary
functions (whether by TDNR or data-namespacing) in which case you
would likely as not end up with half of functions going one way and
the other half going the other.

>
>>
>>
>> The second is that only the author of the datatype could put functions
>> into its namespace; the 'data.foo' notation would only be available
>> for functions written by the datatype's author, while for every other
>> function you would have to use 'foo data'. I dislike this special
>> treatment in OO languages and I dislike it here.
>>
>

-- 
Work is punishment for failing to procrastinate effectively.