Records in Haskell

Sun Jan 8 13:34:07 CET 2012

2012/1/8 Gábor Lehel <illissius at gmail.com>:
> 2012/1/8 Greg Weber <greg at gregweber.info>:
>>
>>
>> 2012/1/8 Gábor Lehel <illissius at gmail.com>
>>>
>>> Thank you. I have a few questions/comments.
>>>
>>>
>>>
>>> "The module/record ambiguity is dealt with in Frege by preferring
>>> modules and requiring a module prefix for the record if there is
>>> ambiguity."
>>>
>>> I think I see why they do it this way (otherwise you can't refer to a
>>> module if a record by the same name is in scope), but on the other
>>> hand it would seem intuitive to me to choose the more specific thing,
>>> and a record feels more specific than a module. Maybe you could go
>>> that way and just not give your qualified imports the same name as a
>>> record? (Unqualified imports are in practice going to be hierarchical,
>>> and no one's in the habit of typing those out to disambiguate things,
>>> so I don't think it really matters if qualified records shadow them.)
>>
>>
>> In the case where a Record has the same name as its containing module it
>> would be more specific than a module, and preferring it makes sense. I think
>> doing this inside the module makes sense, as one shouldn't need to refer to
>> the containing module's name. We should think more about the case where
>> module & records are imported.
>>
>>>
>>>
>>> "Expressions of the form x.n: first infer the type of x. If this is
>>> just an unbound type variable (i.e. the type is unknown yet), then
>>> check if n is an overloaded name (i.e. a class operation). [...] Under
>>> no circumstances, however, will the notation x.n contribute in any way
>>> in inferring the type of x, except for the case when n is a class
>>> operation, where an appropriate class constraint is generated."
>>>
>>> Is this just a simple translation from x.n to n x? What's the
>>> rationale for allowing the x.n syntax for, in addition to record
>>> fields, class methods specifically, but no other functions?
>>
>>
>> It is a simple translation from x.n to T.n x
>> The key point being the function is only accessible through the record's
>> namespace.
>> The dot is only being used to tap into a namespace, and is not available for
>> general function application.
>
> I think my question and your answer are walking past each other here.
> Let me rephrase. The wiki page implies that in addition to using the
> dot to tap into a namespace, you can also use it for general function
> application in the specific case where the function is a class method
> ("appropriate class constraint is generated" etc etc). I don't
> understand why. Or am I misunderstanding?
>
>
>>
>>>
>>>
>>>
>>> Later on you write that the names of record fields are only accessible
>>> from the record's namespace and via record syntax, but not from the
>>> global scope. For Haskell I think it would make sense to reverse this
>>> decision. On the one hand, it would keep backwards compatibility; on
>>> the other hand, Haskell code is already written to avoid name clashes
>>> between record fields, so it wouldn't introduce new problems. Large
>>> gain, little pain. You could use the global-namespace function as you
>>> do now, at the risk of ambiguity, or you could use the new record
>>> syntax and avoid it. (If you were to also allow x.n syntax for
>>> arbitrary functions, this could lead to ambiguity again... you could
>>> solve it by preferring a record field belonging to the inferred type
>>> over a function if both are available, but (at least in my current
>>> state of ignorance) I would prefer to just not allow x.n for anything
>>> other than record fields.)
>>
>>
>> Perhaps you can give some example code for what you have in mind - we do
>> need to figure out the preferred technique for interacting with old-style
>> records. Keep in mind that for new records the entire point is that they
>> must be name-spaced. A module could certainly export top-level functions
>> equivalent to how records work now (we could have a helper that generates
>> those functions).
>
> Let's say you have a record.
>
> data Record = Record { field :: String }
>
> In existing Haskell, you refer to the accessor function as 'field' and
> to the contents of the field as 'field r', where 'r' is a value of
> type Record. With your proposal, you refer to the accessor function as
> 'Record.field' and to the contents of the field as either
> 'Record.field r' or 'r.field'. The point is that I see no conflict or
> drawback in allowing all of these at the same time. Writing 'field' or
> 'field r' would work exactly as it already does, and be ambiguous if
> there is more than one record field with the same name in scope. In
> practice, existing code is already written to avoid this ambiguity so
> it would continue to work. Or you could write 'Record.field r' or
> 'r.field', which would work as the proposal describes and remove the
> ambiguity, and work even in the presence of multiple record fields
> with the same name in scope.
>
> The point is that I see what you gain by allowing record fields to be
> referred to in a namespaced way, but I don't see what you gain by not
> allowing them to be referred to in a non-namespaced way. In theory you
> wouldn't care because the non-namespaced way is inferior anyways, but
> in practice because all existing Haskell code does it that way, it's
> significant.
>
>
>>
>>>
>>>
>>>
>>> Later on:
>>>
>>> "- the function that updates field x of data type T is T.{x=}
>>> - the function that sets field x in a T to 42 is T.{x=42}
>>> - If a::T then a.{x=} and a.{x=42} are valid"
>>>
>>> I think this looks considerably ugly. Aren't there better
>>> alternatives? { T.x = }, { T.x = 42 }, { a.x = }, { a.x = 42 } maybe?
>>> (Does this conflict in some unfinesseable way with explicit layout
>>> contexts?)
>>
>>
>> I think this is one of those slightly different syntaxes that many people
>> will have an initial bad reaction to, however once they use it they will
>> like it just fine. The problem with what you are suggesting is that it would
>> be verbose when updating multiple fields at once. But we should investigate
>> if it is possible to have a syntax closer to the existing update syntax.
>
> Good point.
>
>>
>>>
>>>
>>> "the function that changes field x of a T by applying some function to
>>> it is T.{x <-}"
>>>
>>> Same comment on syntax applies. I believe this is a new feature? It
>>> would be welcome, albeit the overloading of <- is a bit worrisome
>>> (don't have better ideas at the moment, but I think there was a
>>> thread). I assume T.{x <- f}, a.{x <-}, and a.{x <- f} (whatever the
>>> syntax is) would also be valid, by analogy to the above?
>>
>>
>> Yes, new feature, so not necessary in the initial implementation. I
>> personally think Haskell should drop the monadic curly brackets which nobody
>> uses, but whatever syntax works is fine with me.
>>
>>>
>>>
>>>
>>>
>>> Re: Compatibility with existing records: based on (very) cursory
>>> inspection I don't see an obstacle to making it (near-)fully
>>> compatible - you would just be adding some new syntax, most
>>> significantly x.n. Backwards compatibility is a great advantage, so
>>> why not?
>>>
>>>
>>>
>>> Generalizing the syntax to arbitrary TDNR: I think I'm opposed to
>>> this. The problem is that in existing Haskell the vast majority of
>>> expressions (with the notable (and imho unfortunate) exception of
>>> (>>=)) flow from right to left. Going the other way with record fields
>>> isn't a big problem because it's simple and doesn't even feel like
>>> function application so much as member-selection (like modules), but
>>> if you were to allow any function you would soon end up with lengthy
>>> chains of them which would clash nastily with the surrounding code.
>>> Having to jump back and forth and switch directions while reading is
>>> unpleasant. OO languages have this problem and I don't envy them for
>>> it. And in particular having "a . b" mean "first do b, then do a", but
>>> "a.b" mean "do b to a" would be confusing. (You'd already have this
>>> problem with global namespace record field selectors, but at least
>>> it's localized.)
>>
>>
>> I agree - I think a.b or A.b should always mean tapping into a namespace and
>> not be generalized outside of that.
>>
>>>
>>> All of that said, maybe having TDNR with bad syntax is preferable to
>>> not having TDNR at all. Can't it be extended to the existing syntax
>>> (of function application)? Or least some better one, which is ideally
>>> right-to-left? I don't really know the technical details...
>>>
>>> Generalized data-namespaces: Also think I'm opposed. This would import
>>> the problem from OO languages where functions written by the module
>>> (class) author get to have a distinguished syntax (be inside the
>>> namespace) over functions by anyone else (which don't).
>>
>>
>> Maybe you can show some example code? To me this is about controlling
>> exports of namespaces, which is already possible - I think this is mostly a
>> matter of convenience.
>
> If I'm understanding correctly, you're suggesting we be able to write:
>
> data Data = Data Int where
>    twice (Data d) = 2 * d
>    thrice (Data d) = 3 * d
>    ...
>
> and that if we write 'let x = Data 7 in x.thrice' it would evaluate to
> 21. I have two objections.
>
> The first is the same as with the TDNR proposal: you would have both
> code that looks like
> 'data.firstFunction.secondFunction.thirdFunction', as well as the
> existing 'thirdFunction $ secondFunction $ firstFunction data' and
> 'thirdFunction . secondFunction . firstFunction $ data', and if you
> have both of them in the same expression (which you will) it becomes
> unpleasant to read because you have to read them in opposite
> directions.
>
> The second is that only the author of the datatype could put functions
> into its namespace; the 'data.foo' notation would only be available
> for functions written by the datatype's author, while for every other
> function you would have to use 'foo data'. I dislike this special
> treatment in OO languages and I dislike it here.
>
>>
>>
>>>
>>>
>>>
>>>
>>> Another thing that would be nice is lenses to solve the
>>> nested-record-update problem - at least the room to add them later.
>>> Most of the proposed syntax would be unaffected, but you'd need some
>>> syntax for the lens itself... I'm not sure what it might be. Would it
>>> be terrible to have T.x refer to a lens rather than a getter? (I don't
>>> know how you'd refer to the getter then, so probably yeah.) Or maybe {
>>> T.x }, building backwards from { T.x = }?
>>>
>>>
>>>
>>> Another existing language very similar to Haskell whose record system
>>> might be worth evaluating is Disciple: http://disciple.ouroborus.net/.
>>> Unfortunately I couldn't find any specific page it seemed best to link
>>> to.
>>
>>
>> The syntax of DDC seems the same as this proposal. However, I could not find
>> any specific information either.
>
> The main things I remember being interesting about it are that it's
> based on lenses, and uses some kind of extensible projectors system to
> allow something similar to what you achieve with datatype-namespaces,
> namely 'virtual' record fields. But I haven't studied it in detail.

Ah, I remember now where I saw a more thorough discussion: in his
thesis[1]. Section 2.7 (page 115) and in particular 2.7.4 (119). It
seems to be a very similar proposal to datatype-namespacing except it
would address my second objection above and allow third-party code to
add functions to the namespace as well. My first objection (the 'flow'
of the code being in the opposite direction to all other code) still
applies though. I couldn't find any discussion of lenses, except as
pertaining to destructive update (which is another feature of
Disciple).

[1] http://www.cse.unsw.edu.au/~benl/papers/thesis/lippmeier-impure-world.pdf

>
>>
>>>
>>>
>>> On Sun, Jan 8, 2012 at 2:40 AM, Greg Weber <greg at gregweber.info> wrote:
>>> > I have updated the wiki - the entry level page [1] compares the
>>> > different
>>> > proposals and points to a more fleshed out explanation of the Frege
>>> > proposal
>>> > [2].
>>> >
>>> > I think I now understand the differences between the existing proposals
>>> > and
>>> > am able to provide leadership to move this forward. Let me summarize the
>>> > state of things:
>>> > There is a debate over extensible records that we are putting off into
>>> > the
>>> > future. Instead we have 2 proposals to make things better right now:
>>> > * an overloaded record fields proposal that still has implementation
>>> > concerns
>>> > * a name-spacing & simple type resolution proposal that is awaiting your
>>> > critique
>>> >
>>> > The Frege language originally had overloaded record fields but then
>>> > moved to
>>> > the latter system. The existing experience of the Frege language is very
>>> > fortunate for us as we now have some experience to help inform our own
>>> > decision.
>>> >
>>> > Greg Weber
>>> >
>>> > [1] http://hackage.haskell.org/trac/ghc/wiki/Records
>>> > [2] http://hackage.haskell.org/trac/ghc/wiki/Records/NameSpacing
>>> >
>>> >
>>> > On Wed, Jan 4, 2012 at 7:54 AM, Greg Weber <greg at gregweber.info> wrote:
>>> >>
>>> >> The Frege author does not have a ghc mail list account but gave a more
>>> >> detailed explanation of how he goes about TDNR for records and how
>>> >> often it
>>> >> type checks without annotation in practice.
>>> >>
>>> >> A more general explanation is here:
>>> >>
>>> >>
>>> >> http://www.reddit.com/r/haskell/comments/nph9l/records_stalled_again_leadership_needed/c3di9sw
>>> >>
>>> >> He sent a specific response to Simon's mail list message, quoted below:
>>> >>
>>> >> Simon Peyton-Jones is absolutely correct when he notes:
>>> >>
>>> >> Well the most obvious issue is this. 3.2 says e.m = (T.m e) if the
>>> >> expression e has type t and the type constructor of t is T and there
>>> >> exists
>>> >> a function T.m But that innocent-looking statement begs the *entire*
>>> >> question! How do we know if "e has type t?
>>> >>
>>> >> The way it is done in Frege is such that, if you have a function that
>>> >> uses
>>> >> or updates (nondestructively, of course) a "record" then at least the
>>> >> type
>>> >> constructor of that record has to be known. This is no different than
>>> >> doing
>>> >> it explicitly with case constructs, etc., just here you learn the types
>>> >> from
>>> >> the constructors you write in the patterns.
>>> >>
>>> >> Hence, it is not so that one can write a function that updates field f
>>> >> to
>>> >> 42 for any record that contains a field f:
>>> >>
>>> >> foo x = x.{f=42}    -- type annotation required for foo or x
>>> >>
>>> >> In practice this means you'll have to write a type annotation here and
>>> >> there.
>>> >> Often, the field access is not the only one that happens to some
>>> >> variable
>>> >> of record type, or the record is the result of another function
>>> >> application.
>>> >> In such cases, the type is known.
>>> >> I estimate that in 2/3 of all cases one does not need to write (T.e x)
>>> >> in
>>> >> sparsely type annotated code, despite the fact that the frege type
>>> >> checker
>>> >> has a left to right bias and does not yet attempt to find the type of x
>>> >> in
>>> >> the code that "follows" the x.e construct (after let unrolling etc.)
>>> >> I think one could do better and guarantee that, if the type of x is
>>> >> inferrable at all, then so will be x.e (Still, it must be more than
>>> >> just a
>>> >> type variable.)
>>> >>
>>> >> On Sun, Jan 1, 2012 at 2:39 PM, Greg Weber <greg at gregweber.info> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Sat, Dec 31, 2011 at 3:28 PM, Simon Peyton-Jones
>>> >>> <simonpj at microsoft.com> wrote:
>>> >>>>
>>> >>>> Frege has a detailed explanation of the semantics of its record
>>> >>>> implementation, and the language is *very* similar to Haskell. Lets
>>> >>>> just
>>> >>>> start by using Frege's document as the proposal. We can start a new
>>> >>>> wiki
>>> >>>> page as discussions are needed.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> If it’s a serious proposal, it needs a page to specify the design.
>>> >>>> Currently all we have is a paragraph on
>>> >>>> http://hackage.haskell.org/trac/ghc/wiki/Records, under “Better name
>>> >>>> spacing”.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> As previously stated on this thread, the Frege user manual is
>>> >>>> available
>>> >>>> here:
>>> >>>>
>>> >>>> http://code.google.com/p/frege/downloads/detail?name=Language-202.pdf
>>> >>>>
>>> >>>> see Sections 3.2 (primary expressions) and 4.2.1 (Algebraic Data type
>>> >>>> Declaration - Constructors with labeled fields)
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> To all those concerned about Records: look at the Frege
>>> >>>> implementation
>>> >>>> and poke holes in it.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> Well the most obvious issue is this.  3.2 says
>>> >>>>
>>> >>>> e.m = (T.m e) if the expression e has type t and the type constructor
>>> >>>>
>>> >>>> of t is T and there exists a function T.m
>>> >>>>
>>> >>>> But that innocent-looking statement begs the *entire* question!  How
>>> >>>> do
>>> >>>> we know if “e has type t?   This is the route ML takes for arithmetic
>>> >>>> operators: + means integer plus if the argument is of type Int, float
>>> >>>> plus
>>> >>>> if the argument is of type Float, and so on.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> Haskell type classes were specifically designed to address this
>>> >>>> situation. And if you apply type classes to the record situation, I
>>> >>>> think
>>> >>>> you end up with
>>> >>>>
>>> >>>>
>>> >>>> http://hackage.haskell.org/trac/ghc/wiki/Records/OverloadedRecordFields
>>> >>>
>>> >>>
>>> >>> More specifically I think of this as TDNR, which instead of the focus
>>> >>> of
>>> >>> the wiki page of maintaining backwards compatibility and de-surgaring
>>> >>> to
>>> >>> polymorphic constraints. I had hoped that there were different ideas
>>> >>> or at
>>> >>> least more flexibility possible for the TDNR implementation.
>>> >>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> Well, so maybe we can give up on that.  Imagine Frege without the
>>> >>>> above
>>> >>>> abbreviation.  The basic idea is that field names are rendered unique
>>> >>>> by
>>> >>>> pre-pending the module name.  As I understand it, to record selection
>>> >>>> one
>>> >>>> would then be forced to write (T.m e), to select the ‘m’ field.  That
>>> >>>> is
>>> >>>> the, qualification with T is compulsory.   The trouble with this is
>>> >>>> that
>>> >>>> it’s *already* possible; simply define suitably named fields
>>> >>>>
>>> >>>>   data T = MkE { t_m :: Int, t_n :: Bool }
>>> >>>>
>>> >>>> Here I have prefixed with a (lower case version of) the type name.
>>> >>>> So
>>> >>>> we don’t seem to be much further ahead.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> Maybe one could make it optional if there is no ambiguity, much like
>>> >>>> Haskell’s existing qualified names.  But there is considerable
>>> >>>> ambiguity
>>> >>>> about whether T.m means
>>> >>>>
>>> >>>>   m imported from module T
>>> >>>>
>>> >>>> or
>>> >>>>
>>> >>>>   the m record selector of data type T
>>> >>>
>>> >>>
>>> >>> If there is ambiguity, we expect the T to be a module. So you would
>>> >>> need
>>> >>> to refer to Record T's module: OtherModule.T.n or T.T.n
>>> >>> Alternatively these conflicts could be compilation errors.
>>> >>> Either way programmers are expected to structure their programs to
>>> >>> avoid
>>> >>> conflicting names, no different then they do now.
>>> >>>
>>> >>>>
>>> >>>>
>>> >>>> Perhaps one could make it work out.  But before we can talk about it
>>> >>>> we
>>> >>>> need to see a design. Which takes us back to the question of
>>> >>>> leadership.
>>> >>>>
>>> >>>>
>>> >>>
>>> >>> I am trying to provide as much leadership on this issue as I am
>>> >>> capable
>>> >>> of. Your critique is very useful in that effort.
>>> >>>
>>> >>> At this point the Frege proposal without TDNR seems to be a small step
>>> >>> forward. We can now define records with clashing fields in the same
>>> >>> module.
>>> >>> However, without TDNR we don't have convenient access to those fields.
>>> >>> I am contacting the Frege author to see if we can get any more
>>> >>> insights
>>> >>> on implementation details.
>>> >>>
>>> >>>>
>>> >>>> Simon
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> We only want critiques about
>>> >>>>
>>> >>>> * achieving name-spacing right now
>>> >>>>
>>> >>>> * implementing it in such a way that extensible records could be
>>> >>>> implemented in its place in the future, although we will not allow
>>> >>>> that
>>> >>>> discussion to hold up a records implementation now, just possibly
>>> >>>> modify
>>> >>>> things slightly.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> Greg Weber
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Thu, Dec 29, 2011 at 2:00 PM, Simon Peyton-Jones
>>> >>>> <simonpj at microsoft.com> wrote:
>>> >>>>
>>> >>>> | The lack of response, I believe, is just a lack of anyone who
>>> >>>> | can cut through all the noise and come up with some
>>> >>>> | practical way to move forward in one of the many possible
>>> >>>> | directions.
>>> >>>>
>>> >>>> You're right.  But it is very telling that the vast majority of
>>> >>>> responses on
>>> >>>>
>>> >>>>
>>> >>>>  http://www.reddit.com/r/haskell/comments/nph9l/records_stalled_again_leadership_needed/
>>> >>>> were not about the subject (leadership) but rather on suggesting yet
>>> >>>> more, incompletely-specified solutions to the original problem.  My
>>> >>>> modest
>>> >>>> attempt to build a consensus by articulating the simplest solution I
>>> >>>> could
>>> >>>> think of, manifestly failed.
>>> >>>>
>>> >>>> The trouble is that I just don't have the bandwidth (or, if I'm
>>> >>>> honest,
>>> >>>> the motivation) to drive this through to a conclusion. And if no one
>>> >>>> else
>>> >>>> does either, perhaps it isn't *that* important to anyone.  That said,
>>> >>>> it
>>> >>>> clearly is *somewhat* important to a lot of people, so doing nothing
>>> >>>> isn't
>>> >>>> very satisfactory either.
>>> >>>>
>>> >>>> Usually I feel I know how to move forward, but here I don't.
>>> >>>>
>>> >>>> Simon
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>
>>> >
>>> >
>>> > _______________________________________________
>>> > Glasgow-haskell-users mailing list
>>> > Glasgow-haskell-users at haskell.org
>>> > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>> >
>>>
>>>
>>>
>>> --
>>> Work is punishment for failing to procrastinate effectively.
>>
>>
>
>
>
> --
> Work is punishment for failing to procrastinate effectively.

-- 
Work is punishment for failing to procrastinate effectively.