FW: Haskell 98 report problem re lexical structure.

Simon Peyton-Jones simonpj@microsoft.com
Wed, 25 Jul 2001 07:20:00 -0700


This is a multi-part message in MIME format.

------_=_NextPart_001_01C11514.E3CB248D
Content-Type: text/plain;
	charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable

I've looked again at what Gary says below, which relates somewhat to
Christian/Thomas Hallgren's comments about lexical matters.  Here's what
I propose
=20
1.  I will use "lexeme" consistently to mean what the "lexeme"
production means.
=20
2.  The place that "lexeme" is currently used inconsistently is in 2.3
(Comments)  Here I propose to replace paras 2 and 3 thus:
=20
"An ordinary comment begins with a sequence of two or more consecutive
dashes (e.g. --) and extends to the following newline. The sequence of
dashes must not be the prefix of a legal lexeme. For example,
@''">``-->'' <mailto:``@-->  or ``--|' <mailto:``--|@''>  do not begin a
comment, because both of these are legal lexemes.
=20
A nested comment begins with ``{-'' <mailto:``{-@''>  and ends with
``-}'' <mailto:``-}@''> .  No legal lexeme starts with ``{-''
<mailto:``{-@''> ; hence, for exmaple, ``{---'' <mailto:``{---@''>
starts a nested comment despite the trailing dashes."
=20
3.  "--" and "---" are not legal a legal "varsym", so the production for
"varsym" should exclude "dashes" as well as "reservedop".
=20
4.  I believe that the production for "ANY" should include "return",
"linefeed" and "uniWhite".
=20
5.  [Re Christian S's proposal, which I sent earlier, remove "opencom"
from "lexeme"]

=20
I think that does it.  Pls confirm or deny.
=20
Simon

	-----Original Message-----
	From: Memovich, Gary [mailto:GARY.MEMOVICH@kla-tencor.com]=20
	Sent: 19 July 2001 18:53
	To: Simon Peyton-Jones
	Subject: Haskell 98 report problem re lexical structure.
=09
=09
	Hello, I've been studying the Haskell 98 report and think there
are a few problems in the section on lexical structure. First, the
difference between "lexeme" considered as a production in the grammar
and "lexeme" used in the more general sense is very confusing. The fact
that "lexeme" as a grammar production is only distinguished by being in
italic font is easy to overlook. Also, no definition of "lexeme" in the
general sense, is given. I gather that strings matching the productions
"dashes", "opencom", and "closecom" are all considered lexemes, but that
strings matching "comment" and "ncomment" are not. But this is not
explicitly stated anywhere.

	Also the string "---", for example, matches both the productions
"varsym" and "dashes", so the fact that it should be considered the
beginning of a comment is not decided by the maximal-munch rule alone.
Perhaps the definition of "varsym" should explicitly rule out strings
matching "dashes" in the same way it rules out strings matching
"reservedop".

	As a second issue, it is stated at the end of section 2.2 that
characters not in the category "ANY" are not valid in Haskell programs.
But the productions "return", "linefeed", and "uniWhite", which are not
included in the production "ANY", are explicitly included in the
production "whitechar" which implies that they can be used in programs,
at least in a limited way.

	I recently posted a message to the Haskell mailing list that was
related to some of these same issues, but I now consider that message
obsolete.

	Thanks for your time and attention,
	-- Gary
=09


------_=_NextPart_001_01C11514.E3CB248D
Content-Type: text/html;
	charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Message</TITLE>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dus-ascii">
<META content=3D"MSHTML 5.50.4522.1800" name=3DGENERATOR></HEAD>
<BODY>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN =
class=3D227021214-25072001>I've=20
looked again at what Gary says below, which relates somewhat to =
Christian/Thomas=20
Hallgren's comments about lexical matters.&nbsp; Here's what I=20
propose</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D227021214-25072001></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D227021214-25072001>1.&nbsp; I will use "lexeme" consistently to =
mean what=20
the "lexeme" production means.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D227021214-25072001></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D227021214-25072001>2.&nbsp; The place that "lexeme" is currently =
used=20
inconsistently is in 2.3 (Comments)&nbsp; Here I propose to replace =
paras 2 and=20
3 thus:</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D227021214-25072001></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN =
class=3D227021214-25072001>"An=20
ordinary comment begins with a sequence of two or more consecutive =
dashes (e.g.=20
--) and extends to the following newline. The sequence of dashes must =
not be the=20
prefix of a legal lexeme. For example, <A =
href=3D"mailto:``@-->@''">``--&gt;''</A>=20
or <A href=3D"mailto:``--|@''">``--|'</A> do&nbsp;not begin a comment, =
because=20
both of these are legal lexemes.</SPAN></FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN =
class=3D227021214-25072001>A=20
nested comment begins with <A href=3D"mailto:``{-@''">``{-''</A> and =
ends with <A=20
href=3D"mailto:``-}@''">``-}''</A>.&nbsp; No legal lexeme starts with <A =

href=3D"mailto:``{-@''">``{-''</A>; hence, for exmaple, <A=20
href=3D"mailto:``{---@''">``{---''</A> starts a nested comment despite =
the=20
trailing dashes."</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D227021214-25072001></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D227021214-25072001>3.&nbsp; "--" and "---" are not legal a legal =
"varsym",=20
so the production for "varsym" should exclude "dashes" as well as=20
"reservedop".</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D227021214-25072001></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D227021214-25072001>4.&nbsp; I believe that the production for =
"ANY" should=20
include "return", "linefeed" and "uniWhite".</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D227021214-25072001></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D227021214-25072001>5.&nbsp; [Re Christian S's proposal, which I =
sent=20
earlier, remove "opencom" from "lexeme"]</DIV>
<DIV><BR></DIV></SPAN></FONT>
<DIV></DIV>
<DIV><SPAN class=3D640341911-20072001><FONT face=3DArial><FONT =
color=3D#0000ff><FONT=20
size=3D2><SPAN=20
class=3D227021214-25072001>&nbsp;</SPAN></FONT></FONT></FONT></SPAN></DIV=
>
<DIV><SPAN class=3D640341911-20072001><FONT face=3DArial><FONT =
color=3D#0000ff><FONT=20
size=3D2><SPAN class=3D227021214-25072001>I think that does it.&nbsp; =
Pls confirm or=20
deny.</SPAN></FONT></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=3D640341911-20072001><FONT face=3DArial><FONT =
color=3D#0000ff><FONT=20
size=3D2><SPAN=20
class=3D227021214-25072001>&nbsp;</SPAN><BR>Simon</FONT></FONT></FONT></S=
PAN></DIV>
<BLOCKQUOTE dir=3Dltr=20
style=3D"PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px =
solid; MARGIN-RIGHT: 0px">
  <DIV></DIV>
  <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr =
align=3Dleft><FONT=20
  face=3DTahoma size=3D2>-----Original Message-----<BR><B>From:</B> =
Memovich, Gary=20
  [mailto:GARY.MEMOVICH@kla-tencor.com] <BR><B>Sent:</B> 19 July 2001=20
  18:53<BR><B>To:</B> Simon Peyton-Jones<BR><B>Subject:</B> Haskell 98 =
report=20
  problem re lexical structure.<BR><BR></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2>
  <P>Hello, I've been studying the Haskell 98 report and think there are =
a few=20
  problems in the section on lexical structure. First, the difference =
between=20
  "lexeme" considered as a production in the grammar and "lexeme" used =
in the=20
  more general sense is very confusing. The fact that "lexeme" as a =
grammar=20
  production is only distinguished by being in italic font is easy to =
overlook.=20
  Also, no definition of "lexeme" in the general sense, is given. I =
gather that=20
  strings matching the productions "dashes", "opencom", and "closecom" =
are all=20
  considered lexemes, but that strings matching "comment" and "ncomment" =
are=20
  not. But this is not explicitly stated anywhere.</P>
  <P>Also the string "---", for example, matches both the productions =
"varsym"=20
  and "dashes", so the fact that it should be considered the beginning =
of a=20
  comment is not decided by the maximal-munch rule alone. Perhaps the =
definition=20
  of "varsym" should explicitly rule out strings matching "dashes" in =
the same=20
  way it rules out strings matching "reservedop".</P>
  <P>As a second issue, it is stated at the end of section 2.2 that =
characters=20
  not in the category "ANY" are not valid in Haskell programs. But the=20
  productions "return", "linefeed", and "uniWhite", which are not =
included in=20
  the production "ANY", are explicitly included in the production =
"whitechar"=20
  which implies that they can be used in programs, at least in a limited =

way.</P>
  <P>I recently posted a message to the Haskell mailing list that was =
related to=20
  some of these same issues, but I now consider that message =
obsolete.</P>
  <P>Thanks for your time and attention,<BR>--=20
Gary<BR></P></FONT></DIV></BLOCKQUOTE></BODY></HTML>
=00
------_=_NextPart_001_01C11514.E3CB248D--