What is a punctuation character?
Gabriel Dos Reis
gdr at integrable-solutions.net
Fri Mar 16 20:20:19 CET 2012
On Fri, Mar 16, 2012 at 1:49 PM, Brandon Allbery <allbery.b at gmail.com> wrote:
> On Fri, Mar 16, 2012 at 14:30, Gabriel Dos Reis
> <gdr at integrable-solutions.net> wrote:
>> It is not clear what "the language's lexemes are defined in terms of
>> Unicode properties"
>> really means. Why would you need ascSmall (and similar ASCII
>> character categories) then
>> when you already have uniSmall and associates?
> I have to assume that is a leftover from an earlier version of the report,
> because it is indeed already included.
I believe this part has seen very little change from the Revised
Haskell 98 Report.
It is not clear that it is an unintended leftover. Section 2.1 that
you quote below
is the same as in the (Revised) Haskell 98 report.
> See in section 2.1:
> "Haskell uses the Unicode  character set. However, source programs are
> currently biased toward the ASCII character set used in earlier versions of
> I understand this to indicate that Unicode character classes are intended,
> and it does indeed hint that references to ASCII are references to older
> versions of the language (and should probably be considered fossils, as
> ASCII itself is; the American Standard Code for Information Interchange was
> obsoleted by ISO 8859, and modern references to "ASCII" usually should be
> taken to mean "ISO 8859/1").
Unicode support is clearly intended. Also clearly, ASCII support is intended.
However, the Report does not say what the concrete syntax of a Unicode character
should be. (At least I have been unable to find it from the report.)
>> It is not clear that (b) is all that "not particularly meaningful".
>> Have a look at the production
>> <symbol>: it excludes double quote(") and apostrophe (') from uniSymbol.
> The notion of "symbol with certain lexicals that have other meanings *that
> are specified elsewhere in the report*" is not precise enough? It may be
> difficult to characterize things with your required precision, since every
> general statement will necessarily have to carry part or potentially all of
> the entire Report within it if it is not sufficient to use the statement's
> context (as describing some part of the Report).
Well, I hope nobody is suggesting that it is unreasonable to require precision
of a language definition -- especially of Haskell! :-)
A problem with "use the statement's context" is that the context themselves
are not unquestionably unambiguous -- which is part of the reason we are having
this conversation in the first place.
That being said, I am not sure how the passage you quote applies here
or answers conclusively the original questions. Where else is punctutation
defined in the Report? What is the concrete syntax of a punctuation? If you
were going to write a lexer and a parser for Haskell, how you would recognize
a character as a punctuation?
More information about the Haskell-prime