What is a punctuation character?
Gabriel Dos Reis
gdr at integrable-solutions.net
Sat Mar 17 00:29:24 CET 2012
On Fri, Mar 16, 2012 at 6:00 PM, Malcolm Wallace <malcolm.wallace at me.com> wrote:
>>> no purpose to a completely overlapping category unless it is intended to
>>> relate to an earlier standard (say Haskell 1.4).
>
> I believe all Haskell Reports, even since 1.0, have specified that the language "uses" Unicode. If it helps to bring perspective to this discussion, it is my impression that the initial designers of Haskell did not know very much about Unicode, but wanted to avoid the trap of being stuck with ASCII-only, and so decided to reference "whatever Unicode does", as the most obvious and unambiguous way of not having to think about (or specify) these lexical issues themselves.
>
OK.
>> One of the underlying questions is: what is the concrete syntax of a
>> Unicode character in a Haskell program? Note that Chapter 2 goes to a great pain to
>> specify the ASCII concrete syntax.
>
> In my view, the Haskell Report is deliberately agnostic on concrete syntax for Unicode, believing that to be outside the scope of a programming language standard, whilst entirely within the scope of the Unicode standards body.
The trouble is the Unicode standards body believes that the concrete syntax
is entirely within the scope of the programming language definition
(or any client
using Unicode characters), whilst largely restricting itself to the
talking about
code points which are more abstract. So, the trick of reference the
Unicode standards
is not satisfactory :-(
> Seeing as there are (in practice) numerous concrete representations of Unicode (UTF-8 and other encodings), it is largely up to individual compiler implementations which encodings they support for (a) source text, and (b) input/output at runtime.
OK, thanks! I guess a take away from this discussion is that what
is a punctuation is far less well defined than it appears...
A common practice (exemplified by the link I gave earlier) is to restrict the
concrete -syntax- of the input program to the ASCII charset, and use Unicode
escape sequences to include the entire Unicode charset. It is common to use
\uNNNNNN or \UNNNNNN to introduce Unicode characters, but I suspect that is
out of question for Haskell programs because it would clash with
lambda abstraction.
-- Gaby
More information about the Haskell-prime
mailing list