What is a punctuation character?
Gabriel Dos Reis
gdr at integrable-solutions.net
Sat Mar 17 17:37:34 CET 2012
On Fri, Mar 16, 2012 at 6:49 PM, Ian Lynagh <igloo at earth.li> wrote:
> Hi Gaby,
> On Fri, Mar 16, 2012 at 06:29:24PM -0500, Gabriel Dos Reis wrote:
>> OK, thanks! I guess a take away from this discussion is that what
>> is a punctuation is far less well defined than it appears...
> I'm not really sure what you're asking. Haskell's uniSymbol includes all
> Unicode characters (should that be codepoints? I'm not a Unicode expert)
> in the punctuation category; I'm not sure what the best reference is,
> but e.g. table 12 in
> lists a number of Px categories, and a meta-category P "Punctuation".
I guess what I am asking was partly summarized in Iavor's message.
For me, the issue started with bullet number 4 in section 1.1
which states that:
The lexical structure captures the concrete representation
of Haskell programs in text files.
That combined with the opening section 2.1 (e.g. example of terminal syntax)
and the fact that the grammar routinely described two non-terminals
ascXXX (for ASCII characters) and uniXXX for (Unicode character)
suggested that the concrete syntax of Haskell programs in text files
is in ASCII charset. Note this does not conflict with the
general statement that Haskell programs use the Unicode character
because the uniXXX could use the ASCII charset to introduce Unicode
characters -- this is not uncommon practice for programming languages
using Unicode characters; see the link I gave earlier.
However, if I understand Malcolm's message correctly, this is not the case.
Contrary to what I quoted above, Chapter 2 does NOT specify the concrete
representation of Haskell programs in text files. What it does is to capture
the structure of what is obtained from interpreting, *in some unspecified
encoding or unspecified alphabet*, the concrete representation of Haskell
programs in text files. This conclusion is unfortunate, but I believe
it is correct.
Since the encoding or the alphabet is unspecified, it is no longer necessarily
the case that two Haskell implementations would agree on the same lexical
interpretation when presented with the same exact text file containing
a Haskell program.
In its current form, you are correct that the Report should say "codepoint"
instead of characters.
I join Iavor's request in clarifying the alphabet used in the grammar.
More information about the Haskell-prime