Records in Haskell

Sun Jan 8 13:12:46 CET 2012

2012/1/8 Greg Weber <greg at gregweber.info>:
>
>
> 2012/1/8 Gábor Lehel <illissius at gmail.com>
>>
>> Thank you. I have a few questions/comments.
>>
>>
>>
>> "The module/record ambiguity is dealt with in Frege by preferring
>> modules and requiring a module prefix for the record if there is
>> ambiguity."
>>
>> I think I see why they do it this way (otherwise you can't refer to a
>> module if a record by the same name is in scope), but on the other
>> hand it would seem intuitive to me to choose the more specific thing,
>> and a record feels more specific than a module. Maybe you could go
>> that way and just not give your qualified imports the same name as a
>> record? (Unqualified imports are in practice going to be hierarchical,
>> and no one's in the habit of typing those out to disambiguate things,
>> so I don't think it really matters if qualified records shadow them.)
>
>
> In the case where a Record has the same name as its containing module it
> would be more specific than a module, and preferring it makes sense. I think
> doing this inside the module makes sense, as one shouldn't need to refer to
> the containing module's name. We should think more about the case where
> module & records are imported.
>
>>
>>
>> "Expressions of the form x.n: first infer the type of x. If this is
>> just an unbound type variable (i.e. the type is unknown yet), then
>> check if n is an overloaded name (i.e. a class operation). [...] Under
>> no circumstances, however, will the notation x.n contribute in any way
>> in inferring the type of x, except for the case when n is a class
>> operation, where an appropriate class constraint is generated."
>>
>> Is this just a simple translation from x.n to n x? What's the
>> rationale for allowing the x.n syntax for, in addition to record
>> fields, class methods specifically, but no other functions?
>
>
> It is a simple translation from x.n to T.n x
> The key point being the function is only accessible through the record's
> namespace.
> The dot is only being used to tap into a namespace, and is not available for
> general function application.

I think my question and your answer are walking past each other here.
Let me rephrase. The wiki page implies that in addition to using the
dot to tap into a namespace, you can also use it for general function
application in the specific case where the function is a class method
("appropriate class constraint is generated" etc etc). I don't
understand why. Or am I misunderstanding?

>
>>
>>
>>
>> Later on you write that the names of record fields are only accessible
>> from the record's namespace and via record syntax, but not from the
>> global scope. For Haskell I think it would make sense to reverse this
>> decision. On the one hand, it would keep backwards compatibility; on
>> the other hand, Haskell code is already written to avoid name clashes
>> between record fields, so it wouldn't introduce new problems. Large
>> gain, little pain. You could use the global-namespace function as you
>> do now, at the risk of ambiguity, or you could use the new record
>> syntax and avoid it. (If you were to also allow x.n syntax for
>> arbitrary functions, this could lead to ambiguity again... you could
>> solve it by preferring a record field belonging to the inferred type
>> over a function if both are available, but (at least in my current
>> state of ignorance) I would prefer to just not allow x.n for anything
>> other than record fields.)
>
>
> Perhaps you can give some example code for what you have in mind - we do
> need to figure out the preferred technique for interacting with old-style
> records. Keep in mind that for new records the entire point is that they
> must be name-spaced. A module could certainly export top-level functions
> equivalent to how records work now (we could have a helper that generates
> those functions).

Let's say you have a record.

data Record = Record { field :: String }

In existing Haskell, you refer to the accessor function as 'field' and
to the contents of the field as 'field r', where 'r' is a value of
type Record. With your proposal, you refer to the accessor function as
'Record.field' and to the contents of the field as either
'Record.field r' or 'r.field'. The point is that I see no conflict or
drawback in allowing all of these at the same time. Writing 'field' or
'field r' would work exactly as it already does, and be ambiguous if
there is more than one record field with the same name in scope. In
practice, existing code is already written to avoid this ambiguity so
it would continue to work. Or you could write 'Record.field r' or
'r.field', which would work as the proposal describes and remove the
ambiguity, and work even in the presence of multiple record fields
with the same name in scope.

The point is that I see what you gain by allowing record fields to be
referred to in a namespaced way, but I don't see what you gain by not
allowing them to be referred to in a non-namespaced way. In theory you
wouldn't care because the non-namespaced way is inferior anyways, but
in practice because all existing Haskell code does it that way, it's
significant.

>
>>
>>
>>
>> Later on:
>>
>> "- the function that updates field x of data type T is T.{x=}
>> - the function that sets field x in a T to 42 is T.{x=42}
>> - If a::T then a.{x=} and a.{x=42} are valid"
>>
>> I think this looks considerably ugly. Aren't there better
>> alternatives? { T.x = }, { T.x = 42 }, { a.x = }, { a.x = 42 } maybe?
>> (Does this conflict in some unfinesseable way with explicit layout
>> contexts?)
>
>
> I think this is one of those slightly different syntaxes that many people
> will have an initial bad reaction to, however once they use it they will
> like it just fine. The problem with what you are suggesting is that it would
> be verbose when updating multiple fields at once. But we should investigate
> if it is possible to have a syntax closer to the existing update syntax.

Good point.

>
>>
>>
>> "the function that changes field x of a T by applying some function to
>> it is T.{x <-}"
>>
>> Same comment on syntax applies. I believe this is a new feature? It
>> would be welcome, albeit the overloading of <- is a bit worrisome
>> (don't have better ideas at the moment, but I think there was a
>> thread). I assume T.{x <- f}, a.{x <-}, and a.{x <- f} (whatever the
>> syntax is) would also be valid, by analogy to the above?
>
>
> Yes, new feature, so not necessary in the initial implementation. I
> personally think Haskell should drop the monadic curly brackets which nobody
> uses, but whatever syntax works is fine with me.
>
>>
>>
>>
>>
>> Re: Compatibility with existing records: based on (very) cursory
>> inspection I don't see an obstacle to making it (near-)fully
>> compatible - you would just be adding some new syntax, most
>> significantly x.n. Backwards compatibility is a great advantage, so
>> why not?
>>
>>
>>
>> Generalizing the syntax to arbitrary TDNR: I think I'm opposed to
>> this. The problem is that in existing Haskell the vast majority of
>> expressions (with the notable (and imho unfortunate) exception of
>> (>>=)) flow from right to left. Going the other way with record fields
>> isn't a big problem because it's simple and doesn't even feel like
>> function application so much as member-selection (like modules), but
>> if you were to allow any function you would soon end up with lengthy
>> chains of them which would clash nastily with the surrounding code.
>> Having to jump back and forth and switch directions while reading is
>> unpleasant. OO languages have this problem and I don't envy them for
>> it. And in particular having "a . b" mean "first do b, then do a", but
>> "a.b" mean "do b to a" would be confusing. (You'd already have this
>> problem with global namespace record field selectors, but at least
>> it's localized.)
>
>
> I agree - I think a.b or A.b should always mean tapping into a namespace and
> not be generalized outside of that.
>
>>
>> All of that said, maybe having TDNR with bad syntax is preferable to
>> not having TDNR at all. Can't it be extended to the existing syntax
>> (of function application)? Or least some better one, which is ideally
>> right-to-left? I don't really know the technical details...
>>
>> Generalized data-namespaces: Also think I'm opposed. This would import
>> the problem from OO languages where functions written by the module
>> (class) author get to have a distinguished syntax (be inside the
>> namespace) over functions by anyone else (which don't).
>
>
> Maybe you can show some example code? To me this is about controlling
> exports of namespaces, which is already possible - I think this is mostly a
> matter of convenience.

If I'm understanding correctly, you're suggesting we be able to write:

data Data = Data Int where
    twice (Data d) = 2 * d
    thrice (Data d) = 3 * d
    ...

and that if we write 'let x = Data 7 in x.thrice' it would evaluate to
21. I have two objections.

The first is the same as with the TDNR proposal: you would have both
code that looks like
'data.firstFunction.secondFunction.thirdFunction', as well as the
existing 'thirdFunction $ secondFunction $ firstFunction data' and
'thirdFunction . secondFunction . firstFunction $ data', and if you
have both of them in the same expression (which you will) it becomes
unpleasant to read because you have to read them in opposite
directions.

The second is that only the author of the datatype could put functions
into its namespace; the 'data.foo' notation would only be available
for functions written by the datatype's author, while for every other
function you would have to use 'foo data'. I dislike this special
treatment in OO languages and I dislike it here.

>
>
>>
>>
>>
>>
>> Another thing that would be nice is lenses to solve the
>> nested-record-update problem - at least the room to add them later.
>> Most of the proposed syntax would be unaffected, but you'd need some
>> syntax for the lens itself... I'm not sure what it might be. Would it
>> be terrible to have T.x refer to a lens rather than a getter? (I don't
>> know how you'd refer to the getter then, so probably yeah.) Or maybe {
>> T.x }, building backwards from { T.x = }?
>>
>>
>>
>> Another existing language very similar to Haskell whose record system
>> might be worth evaluating is Disciple: http://disciple.ouroborus.net/.
>> Unfortunately I couldn't find any specific page it seemed best to link
>> to.
>
>
> The syntax of DDC seems the same as this proposal. However, I could not find
> any specific information either.

The main things I remember being interesting about it are that it's
based on lenses, and uses some kind of extensible projectors system to
allow something similar to what you achieve with datatype-namespaces,
namely 'virtual' record fields. But I haven't studied it in detail.

>
>>
>>
>> On Sun, Jan 8, 2012 at 2:40 AM, Greg Weber <greg at gregweber.info> wrote:
>> > I have updated the wiki - the entry level page [1] compares the
>> > different
>> > proposals and points to a more fleshed out explanation of the Frege
>> > proposal
>> > [2].
>> >
>> > I think I now understand the differences between the existing proposals
>> > and
>> > am able to provide leadership to move this forward. Let me summarize the
>> > state of things:
>> > There is a debate over extensible records that we are putting off into
>> > the
>> > future. Instead we have 2 proposals to make things better right now:
>> > * an overloaded record fields proposal that still has implementation
>> > concerns
>> > * a name-spacing & simple type resolution proposal that is awaiting your
>> > critique
>> >
>> > The Frege language originally had overloaded record fields but then
>> > moved to
>> > the latter system. The existing experience of the Frege language is very
>> > fortunate for us as we now have some experience to help inform our own
>> > decision.
>> >
>> > Greg Weber
>> >
>> > [1] http://hackage.haskell.org/trac/ghc/wiki/Records
>> > [2] http://hackage.haskell.org/trac/ghc/wiki/Records/NameSpacing
>> >
>> >
>> > On Wed, Jan 4, 2012 at 7:54 AM, Greg Weber <greg at gregweber.info> wrote:
>> >>
>> >> The Frege author does not have a ghc mail list account but gave a more
>> >> detailed explanation of how he goes about TDNR for records and how
>> >> often it
>> >> type checks without annotation in practice.
>> >>
>> >> A more general explanation is here:
>> >>
>> >>
>> >> http://www.reddit.com/r/haskell/comments/nph9l/records_stalled_again_leadership_needed/c3di9sw
>> >>
>> >> He sent a specific response to Simon's mail list message, quoted below:
>> >>
>> >> Simon Peyton-Jones is absolutely correct when he notes:
>> >>
>> >> Well the most obvious issue is this. 3.2 says e.m = (T.m e) if the
>> >> expression e has type t and the type constructor of t is T and there
>> >> exists
>> >> a function T.m But that innocent-looking statement begs the *entire*
>> >> question! How do we know if "e has type t?
>> >>
>> >> The way it is done in Frege is such that, if you have a function that
>> >> uses
>> >> or updates (nondestructively, of course) a "record" then at least the
>> >> type
>> >> constructor of that record has to be known. This is no different than
>> >> doing
>> >> it explicitly with case constructs, etc., just here you learn the types
>> >> from
>> >> the constructors you write in the patterns.
>> >>
>> >> Hence, it is not so that one can write a function that updates field f
>> >> to
>> >> 42 for any record that contains a field f:
>> >>
>> >> foo x = x.{f=42}    -- type annotation required for foo or x
>> >>
>> >> In practice this means you'll have to write a type annotation here and
>> >> there.
>> >> Often, the field access is not the only one that happens to some
>> >> variable
>> >> of record type, or the record is the result of another function
>> >> application.
>> >> In such cases, the type is known.
>> >> I estimate that in 2/3 of all cases one does not need to write (T.e x)
>> >> in
>> >> sparsely type annotated code, despite the fact that the frege type
>> >> checker
>> >> has a left to right bias and does not yet attempt to find the type of x
>> >> in
>> >> the code that "follows" the x.e construct (after let unrolling etc.)
>> >> I think one could do better and guarantee that, if the type of x is
>> >> inferrable at all, then so will be x.e (Still, it must be more than
>> >> just a
>> >> type variable.)
>> >>
>> >> On Sun, Jan 1, 2012 at 2:39 PM, Greg Weber <greg at gregweber.info> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Sat, Dec 31, 2011 at 3:28 PM, Simon Peyton-Jones
>> >>> <simonpj at microsoft.com> wrote:
>> >>>>
>> >>>> Frege has a detailed explanation of the semantics of its record
>> >>>> implementation, and the language is *very* similar to Haskell. Lets
>> >>>> just
>> >>>> start by using Frege's document as the proposal. We can start a new
>> >>>> wiki
>> >>>> page as discussions are needed.
>> >>>>
>> >>>>
>> >>>>
>> >>>> If it’s a serious proposal, it needs a page to specify the design.
>> >>>> Currently all we have is a paragraph on
>> >>>> http://hackage.haskell.org/trac/ghc/wiki/Records, under “Better name
>> >>>> spacing”.
>> >>>>
>> >>>>
>> >>>>
>> >>>> As previously stated on this thread, the Frege user manual is
>> >>>> available
>> >>>> here:
>> >>>>
>> >>>> http://code.google.com/p/frege/downloads/detail?name=Language-202.pdf
>> >>>>
>> >>>> see Sections 3.2 (primary expressions) and 4.2.1 (Algebraic Data type
>> >>>> Declaration - Constructors with labeled fields)
>> >>>>
>> >>>>
>> >>>>
>> >>>> To all those concerned about Records: look at the Frege
>> >>>> implementation
>> >>>> and poke holes in it.
>> >>>>
>> >>>>
>> >>>>
>> >>>> Well the most obvious issue is this.  3.2 says
>> >>>>
>> >>>> e.m = (T.m e) if the expression e has type t and the type constructor
>> >>>>
>> >>>> of t is T and there exists a function T.m
>> >>>>
>> >>>> But that innocent-looking statement begs the *entire* question!  How
>> >>>> do
>> >>>> we know if “e has type t?   This is the route ML takes for arithmetic
>> >>>> operators: + means integer plus if the argument is of type Int, float
>> >>>> plus
>> >>>> if the argument is of type Float, and so on.
>> >>>>
>> >>>>
>> >>>>
>> >>>> Haskell type classes were specifically designed to address this
>> >>>> situation. And if you apply type classes to the record situation, I
>> >>>> think
>> >>>> you end up with
>> >>>>
>> >>>>
>> >>>> http://hackage.haskell.org/trac/ghc/wiki/Records/OverloadedRecordFields
>> >>>
>> >>>
>> >>> More specifically I think of this as TDNR, which instead of the focus
>> >>> of
>> >>> the wiki page of maintaining backwards compatibility and de-surgaring
>> >>> to
>> >>> polymorphic constraints. I had hoped that there were different ideas
>> >>> or at
>> >>> least more flexibility possible for the TDNR implementation.
>> >>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> Well, so maybe we can give up on that.  Imagine Frege without the
>> >>>> above
>> >>>> abbreviation.  The basic idea is that field names are rendered unique
>> >>>> by
>> >>>> pre-pending the module name.  As I understand it, to record selection
>> >>>> one
>> >>>> would then be forced to write (T.m e), to select the ‘m’ field.  That
>> >>>> is
>> >>>> the, qualification with T is compulsory.   The trouble with this is
>> >>>> that
>> >>>> it’s *already* possible; simply define suitably named fields
>> >>>>
>> >>>>   data T = MkE { t_m :: Int, t_n :: Bool }
>> >>>>
>> >>>> Here I have prefixed with a (lower case version of) the type name.
>> >>>> So
>> >>>> we don’t seem to be much further ahead.
>> >>>>
>> >>>>
>> >>>>
>> >>>> Maybe one could make it optional if there is no ambiguity, much like
>> >>>> Haskell’s existing qualified names.  But there is considerable
>> >>>> ambiguity
>> >>>> about whether T.m means
>> >>>>
>> >>>>   m imported from module T
>> >>>>
>> >>>> or
>> >>>>
>> >>>>   the m record selector of data type T
>> >>>
>> >>>
>> >>> If there is ambiguity, we expect the T to be a module. So you would
>> >>> need
>> >>> to refer to Record T's module: OtherModule.T.n or T.T.n
>> >>> Alternatively these conflicts could be compilation errors.
>> >>> Either way programmers are expected to structure their programs to
>> >>> avoid
>> >>> conflicting names, no different then they do now.
>> >>>
>> >>>>
>> >>>>
>> >>>> Perhaps one could make it work out.  But before we can talk about it
>> >>>> we
>> >>>> need to see a design. Which takes us back to the question of
>> >>>> leadership.
>> >>>>
>> >>>>
>> >>>
>> >>> I am trying to provide as much leadership on this issue as I am
>> >>> capable
>> >>> of. Your critique is very useful in that effort.
>> >>>
>> >>> At this point the Frege proposal without TDNR seems to be a small step
>> >>> forward. We can now define records with clashing fields in the same
>> >>> module.
>> >>> However, without TDNR we don't have convenient access to those fields.
>> >>> I am contacting the Frege author to see if we can get any more
>> >>> insights
>> >>> on implementation details.
>> >>>
>> >>>>
>> >>>> Simon
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> We only want critiques about
>> >>>>
>> >>>> * achieving name-spacing right now
>> >>>>
>> >>>> * implementing it in such a way that extensible records could be
>> >>>> implemented in its place in the future, although we will not allow
>> >>>> that
>> >>>> discussion to hold up a records implementation now, just possibly
>> >>>> modify
>> >>>> things slightly.
>> >>>>
>> >>>>
>> >>>>
>> >>>> Greg Weber
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Thu, Dec 29, 2011 at 2:00 PM, Simon Peyton-Jones
>> >>>> <simonpj at microsoft.com> wrote:
>> >>>>
>> >>>> | The lack of response, I believe, is just a lack of anyone who
>> >>>> | can cut through all the noise and come up with some
>> >>>> | practical way to move forward in one of the many possible
>> >>>> | directions.
>> >>>>
>> >>>> You're right.  But it is very telling that the vast majority of
>> >>>> responses on
>> >>>>
>> >>>>
>> >>>>  http://www.reddit.com/r/haskell/comments/nph9l/records_stalled_again_leadership_needed/
>> >>>> were not about the subject (leadership) but rather on suggesting yet
>> >>>> more, incompletely-specified solutions to the original problem.  My
>> >>>> modest
>> >>>> attempt to build a consensus by articulating the simplest solution I
>> >>>> could
>> >>>> think of, manifestly failed.
>> >>>>
>> >>>> The trouble is that I just don't have the bandwidth (or, if I'm
>> >>>> honest,
>> >>>> the motivation) to drive this through to a conclusion. And if no one
>> >>>> else
>> >>>> does either, perhaps it isn't *that* important to anyone.  That said,
>> >>>> it
>> >>>> clearly is *somewhat* important to a lot of people, so doing nothing
>> >>>> isn't
>> >>>> very satisfactory either.
>> >>>>
>> >>>> Usually I feel I know how to move forward, but here I don't.
>> >>>>
>> >>>> Simon
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>> >
>> > _______________________________________________
>> > Glasgow-haskell-users mailing list
>> > Glasgow-haskell-users at haskell.org
>> > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>> >
>>
>>
>>
>> --
>> Work is punishment for failing to procrastinate effectively.
>
>

-- 
Work is punishment for failing to procrastinate effectively.