[ghc-steering-committee] Record dot notation

Simon Peyton Jones simonpj at microsoft.com
Thu Feb 13 22:10:14 UTC 2020


I don't really understand your example:  why would a programmer expect `f M.x` and `(f M).x` to mean the same thing?  Or when would you want to perform this transformation on an existing program?  It doesn't even make sense to apply a function to a module name...

Well, if we didn’t have qualified names then of course (f M.x) would mean
(f M) . x
that is, f applied to the data constructor M, composed with x.  But we do have qualified names, and thus regard M.x (with no spaces) as binding tighter than function application.  One way to think of it (and the way it is implemented) is to think of M.x as a lexeme.

I’m just saying that I think it’s be deeply strange to parse (f M.x) in one way, and (f m.x) in a completely different way, especially when (M.x) and (m.x) both have the same informal reading: “take the x component of M or m resp”.  I’m just seeking lexical and syntactic consistency.

Yes I know that today we do indeed parse these two utterly differently, and that’s reason I never write function composition without spaces around it.  Let’s take the opportunity to fix this 😊.

No one has suggested any new alternatives, so it probably makes sense for me to put forward a slate of possibilities to vote on.  But not  tonight.

Simon

From: Iavor Diatchki <iavor.diatchki at gmail.com>
Sent: 13 February 2020 17:56
To: Simon Peyton Jones <simonpj at microsoft.com>
Cc: Simon Marlow <marlowsd at gmail.com>; ghc-steering-committee <ghc-steering-committee at haskell.org>; Joachim Breitner <mail at joachim-breitner.de>
Subject: Re: [ghc-steering-committee] Record dot notation



On Wed, Feb 12, 2020 at 12:05 PM Simon Peyton Jones <simonpj at microsoft.com<mailto:simonpj at microsoft.com>> wrote:


Yes, but `f M.x` and `(f M).x` mean very different things too.  Nobody has any trouble with parsing M.x as a single lexeme.   To be consistent with this approach, we should require `f (M.x)` for qualified names.

Both option (4) and option (5) require just one new lexeme .X, and the rest can be handled in the parser.


I don't really understand your example:  why would a programmer expect `f M.x` and `(f M).x` to mean the same thing?  Or when would you want to perform this transformation on an existing program?  It doesn't even make sense to apply a function to a module name...


This is not at all the case with record selection:  I can absolutely see myself writing the expression `f r.x`, and later decide that maybe I want to apply a function to `r` before selecting:  `f (g r).x`.     Except that with proposal (6) this actually means something very different, and *at best* I'd get a confusing type error, and at worst, I'd silently get a completely different behavior.

-Iavor




From: Iavor Diatchki <iavor.diatchki at gmail.com<mailto:iavor.diatchki at gmail.com>>
Sent: 12 February 2020 17:58
To: Simon Marlow <marlowsd at gmail.com<mailto:marlowsd at gmail.com>>
Cc: Simon Peyton Jones <simonpj at microsoft.com<mailto:simonpj at microsoft.com>>; ghc-steering-committee <ghc-steering-committee at haskell.org<mailto:ghc-steering-committee at haskell.org>>; Joachim Breitner <mail at joachim-breitner.de<mailto:mail at joachim-breitner.de>>
Subject: Re: [ghc-steering-committee] Record dot notation

Both option (4) and option (5) require just one new lexeme .X, and the rest can be handled in the parser.  The difference between them is the precedence of selection, 4 has higher precedence than application (just like record update), while 5 has the same precedence as application.  Most commonly, this would show up in examples like `f x.y`: with (4) this means `f (x.y)` with (5) this means `(f x).y`.

Simon PJ, why do we need the special case for option (6), when it seems option (4) does the same thing in a simpler way?

I am strongly against the new option (6) because `f x.y` and `f (g x).y` mean very different things, and being able to name and abstract expressions is one of the big selling point of Haskell.  Also having a single consistent rule is a lot easier to teach and read, there really is no need for a special case here.

-Iavor





On Wed, Feb 12, 2020 at 8:44 AM Simon Marlow <marlowsd at gmail.com<mailto:marlowsd at gmail.com>> wrote:
On Wed, 12 Feb 2020 at 14:53, Simon Peyton Jones <simonpj at microsoft.com<mailto:simonpj at microsoft.com>> wrote:
Don’t forget option (6): like (5) but treat r.x as a lexeme.

I find it hard to justify a language in which
               f M.x     means      f (M.x)
but         f m.x     means      (f m).x

especially when the “.” means, in both cases, “take the x component of the thing on the left”.

So here I’m leaning even harder on the connection with qualified names: let’s simply be consistent with that.

I’m quite content to follow (5) on the meaning of
               f (g 3).x
That is, it means the same as (f (g 3)   .x), namely  (f (g 3)).x

Yes OK, I think that's reasonable. (I hadn't digested your earlier message proposing this properly, but I went back and re-read it just now.)

I can imagine explaining that to someone - there's a straightforward lexical syntax, and the context-free grammar is similar to the rules for function application.

Joachim what do you think?

Cheers
Simon


But I’m very keen on maintaining consistency with qualified names when the thing on the LHS is a token (or dotted chain thereof.)

Does anyone else have an alternative beyond 1-6 that they want to put forward?

Simon

From: ghc-steering-committee <ghc-steering-committee-bounces at haskell.org<mailto:ghc-steering-committee-bounces at haskell.org>> On Behalf Of Simon Marlow
Sent: 12 February 2020 11:21
To: Richard Eisenberg <rae at richarde.dev<mailto:rae at richarde.dev>>
Cc: ghc-steering-committee <ghc-steering-committee at haskell.org<mailto:ghc-steering-committee at haskell.org>>; Joachim Breitner <mail at joachim-breitner.de<mailto:mail at joachim-breitner.de>>
Subject: Re: [ghc-steering-committee] Record dot notation


On Mon, 10 Feb 2020 at 14:15, Richard Eisenberg <rae at richarde.dev<mailto:rae at richarde.dev>> wrote:
Upon careful consideration, I think the whitespace concerns here are somewhat ill-founded.

First, please see https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0229-whitespace-bang-patterns.rst#proposed-change-specification<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fghc-proposals%2Fghc-proposals%2Fblob%2Fmaster%2Fproposals%2F0229-whitespace-bang-patterns.rst%23proposed-change-specification&data=02%7C01%7Csimonpj%40microsoft.com%7C3ccb2e636fb74651784908d7b0adfd2d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637172133656893264&sdata=9PjYhxZCwf420ID3Qd527nR%2FOvu0BgnIYikoc%2FiId8E%3D&reserved=0>, where (among other points), a careful description of "loose infix" vs "prefix" vs "suffix" vs "tight infix" is discussed. Here is a set of examples:

a ! b   -- a loose infix occurrence

a!b     -- a tight infix occurrence

a !b    -- a prefix occurrence

a! b    -- a suffix occurrence
Yes and I was not very keen on that proposal (my concerns are on the discussion thread).

This distinction is *not* just made by example, but that proposal (which has been accepted) defines these precisely. So, the comments on this thread about what counts as a naked selector are addressed: a naked selector is one where the dot is a prefix occurrence.

Other whitespace-wariness comes from worrying about the distinction between prefix and tight infix occurrences. That is, should we differentiate between the interpretation of `f r.x` and `f r .x`. Yet in all versions of any of this, we differentiate between loose infix and the others. Thus there is *always* whitespace-sensitivity around dot. Note that this is true, as Simon PJ pointed out, regardless of this proposal, where a tight-infix usage of a dot with a capitalized identifier on the left is taken as a module qualification. In all of its versions, this proposal *increases* the whitespace sensitivity, by further distinguishing between prefix occurrences of dot and other usages.

Let's compare options 3 and 5 with this analysis then:

Option 3:
loose-infix: whatever (.) is in scope
tight-infix:
  - if left-hand is a capitalized identifier: module qualification
  - otherwise: record selection, binding tighter than function application
prefix: postfix record selection, binding like function application
suffix: presumably, whatever (.) is in scope

Option 5:
loose-infix: whatever (.) is in scope
tight-infix:
 - if left-hand is a capitalized identifier: module qualification
 - otherwise: postfix record selection, binding like function application
prefix: postfix record selection, binding like function application
suffix: presumably, whatever (.) is in scope

That's a good summary - but note that under Option 5 tight-infix and prefix are the same, modulo the qualified-identifier case, and this is the key difference. What I wanted to avoid was having to use the language of tight-infix vs. prefix AT ALL in understanding how record selection syntax works, and (5) achieves that whereas (3) doesn't.

Under option 5 we get one new lexeme:
   .<varid>
and everything else can be handled at the context-free grammar level. This is a nice minimal addition to the language. We don't have to invoke the mess that is proposal #229, which was forced upon us because BangPatterns and TypeApplications made the handling of (!) and (@) so complicated. If we don't have to do the same to (.), I believe we should take the opportunity to avoid it.

Cheers
Simon


My point here is that option (5) is no more or less whitespace sensitive than option (3). Both need the same cases to figure what the period character in your code means. I think this is why Simon PJ has keyed this part of the debate to module qualification: that existing feature (not under debate) essentially breaks the symmetry here, meaning that we have more room to work with without breaking symmetry further.

My vote is thus:

3 > 5 > 2 > 4 > 1

Other points of motivation:
- Despite my argument above, I see the merit in (5). I just think that an argument "we don't want dot to be whitespace-sensitive" isn't really effective.
- I want to accept this proposal. We're not going to get another go at this.
- I really don't like the way record-update binds, and (4) reminds me too much of that.

Richard

On Feb 10, 2020, at 9:58 AM, Simon Marlow <marlowsd at gmail.com<mailto:marlowsd at gmail.com>> wrote:

On Fri, 7 Feb 2020 at 22:37, Joachim Breitner <mail at joachim-breitner.de<mailto:mail at joachim-breitner.de>> wrote:

I really would prefer a design where all these questions do not even
need to be asked…

Me too. Also what about (.x) vs. ( .x), are those the same?

So I think to have the full picture, we need the following option as
well on the ballot:

 5. .x is a postfix operator, binding exactly like application,
    whether it is naked or not.
    (This is option 3, but without the whitespace-sensitivity.)

[...]

Anyways, now for my opinion: Assuming no more options are added, my
ranking will be

  5 > 4 > 2 > 1 > 3

This puts first the two variants where .x behaves like an existing
language feature (either like function application or like record
updates), has no whitespace sensitivity, and follows existing languages
precedence (JS and OCaml, resp.).
Then the compromise solution that simply forbids putting spaces before
.x (so at least the program doesn't change semantics silently).
I dislike variant 3, which adds a _new_ special rule, and where adding
a single space can change the meaning of the program, so I rank that
last.

I'm also against whitespace-sensitivity and I lean towards this ordering too.
But I'm going with:

5 > 2 > 1 > 4 > 3

Rationale: (5) seems the easiest to explain and has the fewest special cases, yet covers the use-cases we're interested in. Beyond that I want to be conservative because I find it hard to predict the ramifications of the more-complex alternatives 4/3, so I've put 2/1 ahead of those. I've made my peace with the current record selection syntax binding more tightly than application, and indeed I often rely on it to avoid a $, so I'm OK with 4 over 3.

Cheers
Simon




Cheers,
Joachim


PS, because its on my mind, and just for fun:

Under variant 3, both foo1 and foo2 typecheck, they do quite different
things (well, one loops).

  data Stream a = Stream { val :: a, next :: Stream a }

  foo1 f s = Stream (s.val) (foo1 (fmap f s).next)
  foo2 f s = Stream (s.val) (foo2 (fmap f s) .next)


--
Joachim Breitner
  mail at joachim-breitner.de<mailto:mail at joachim-breitner.de>
  http://www.joachim-breitner.de/<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.joachim-breitner.de%2F&data=02%7C01%7Csimonpj%40microsoft.com%7C3ccb2e636fb74651784908d7b0adfd2d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637172133656903263&sdata=2n1e1Hwnw70wMwaq3u1Pl0ci62ssYWZEA1LadeEBt5M%3D&reserved=0>


_______________________________________________
ghc-steering-committee mailing list
ghc-steering-committee at haskell.org<mailto:ghc-steering-committee at haskell.org>
https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-steering-committee<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-steering-committee&data=02%7C01%7Csimonpj%40microsoft.com%7C3ccb2e636fb74651784908d7b0adfd2d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637172133656913259&sdata=LZ0flkPFpR1wZmfP1MgcSDZtmAFs9a1STUO0hxCIr3E%3D&reserved=0>
_______________________________________________
ghc-steering-committee mailing list
ghc-steering-committee at haskell.org<mailto:ghc-steering-committee at haskell.org>
https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-steering-committee<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-steering-committee&data=02%7C01%7Csimonpj%40microsoft.com%7C3ccb2e636fb74651784908d7b0adfd2d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637172133656913259&sdata=LZ0flkPFpR1wZmfP1MgcSDZtmAFs9a1STUO0hxCIr3E%3D&reserved=0>

_______________________________________________
ghc-steering-committee mailing list
ghc-steering-committee at haskell.org<mailto:ghc-steering-committee at haskell.org>
https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-steering-committee<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-steering-committee&data=02%7C01%7Csimonpj%40microsoft.com%7C3ccb2e636fb74651784908d7b0adfd2d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637172133656923253&sdata=Py%2BDfrqp2kubC4%2B2NqY0g2BaaywjmFvibYPEiOLNUKQ%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-steering-committee/attachments/20200213/dc44dc87/attachment-0001.html>


More information about the ghc-steering-committee mailing list