[Haskell-cafe] Re: Editors for Haskell

Fri Jun 2 09:57:10 EDT 2006

Simon Marlow wrote:
> Malcolm Wallace wrote:
>> "Brian Hulley" <brianh at metamilk.com> wrote:
>>
>>
>>> Thanks for pointing this out. Although there is still a problem with
>>> the  fact that var, qvar, qcon etc is in the context free syntax
>>> instead of the  lexical syntax so you could write:
>>>
>>>        2 `    plus      ` 4
>>>        (    Prelude.+
>>>               {- a comment -} ) 5 6
>>
>>
>> You appear to be right.  However, I don't think I have ever seen a
>> piece of code that actually used the first form.  People seem to
>> naturally place the backticks right next to the variable name. Should we 
>> consider the fact that whitespace and comments are
>> permitted between backticks to be a bug in the Report?  It certainly
>> feels like it should be a lexical issue.
>
> I tend in the other direction: I'd rather see as much as possible
> pushed into the context-free syntax.  The only reason that qualified
> identifiers are in the lexical syntax currently is because of the
> clash with the '.' operator.
>
> I'm not sure I can concisely explain why I think it is better to use
> the context-free syntax than the lexical syntax, but I'll try.  I
> believe the lexical syntax should adhere, as far as possible, to the
> following rule:
>   juxtaposition of lexemes of different classes should not affect
>   the lexical interpretation.
>
> in other words, whitespace between different lexemes is irrelevant.

A question here is: what is a lexeme?

For example there are floating point numbers, which are written without 
spaces, but which could be considered to consist of primitive whole-number 
lexemes interspersed with . e -

    34.678e-98

I don't see what the difference is between them and

    Prelude.+

especially since we *really* need the dot for other purposes in the CFG such 
as composition and (hopefully at some point) field selection.

Since Prelude.+ is by the above argument a single lexeme, it seems 
consistent to say that

    `Mod.Id`
    (Mod.+)

are also single lexemes. The brackets in (Mod.+) have a lexical purpose, to 
turn a symbol into an id, which is very different imho from the use of 
brackets to parenthesise expressions or form sections.

For example, should a parser consider ( +   ) to be an incomplete 
parenthesised expression with 2 gaps or an id formed from the symbol + ? At 
the moment of course it would be an id but this causes problems when you're 
trying to parse Haskell and highlight incomplete expressions, because you'd 
expect that if the user indended to just make an id there wouldn't be any 
reason to leave spaces between the symbol and the brackets.

In many ways it would be a lot easier if the (lexical) grammar was changed 
so that the "turning a symbol into an id" would just be indicated by 
parentheses round the (unqualified part of the) symbol alone not the whole 
thing thus:

     Prelude.(+)

so that the first lexical rule would be

     1) Parentheses around an unqualifed symbol turns it into an id

Then the ` could be used to turn a (possibly qualified) id into a symbol:

    `Prelude.plus
    `Prelude.(+)

and there would be no need for a closing `, so the second rule would be:

     2) A grave before an id turns it into a symbol (that can't subsequently 
be turned back into an id!)

There are at least five motivations for suggesting the above changes:

     1) It allows operator expressions to be parsed by LL1 recursive descent 
:-)
     2) The low level details of whether or not a symbol or id is used is 
kept to the lexical level
     3) You can use a qualified function and an operator without knowing in 
advance whether it has been declared as a symbol or an id in the module. For 
example, you could type
                  x `Mod.
and expect to get a pop-up list of functions in Mod, such as (+) add etc, 
whereas with the current rules, you'd have to go back and add graves around 
the qualified function if the function was declared as an id and remove the 
grave if it was already declared as an operator.
      4) Only one grave is needed :-)
      5) An editor can give more feedback, by distinguishing between 
incomplete expressions and the turning of symbols into ids

Regards, Brian.

-- 
Logic empowers us and Love gives us purpose.
Yet still phantoms restless for eras long past,
congealed in the present in unthought forms,
strive mightily unseen to destroy us.

http://www.metamilk.com