[Haskell-cafe] Lexical Syntax and Unicode
Manlio Perillo
manlio_perillo at libero.it
Sat Nov 14 05:31:20 EST 2009
Hi.
Reading the Haskell 98 Report (section 9.2), I have found a possible
problem.
The lexical syntax supports Unicode, however this is not true for the
newline:
newline -> return linefeed | return | linefeed | formfeed
The Unicode standard adds two additional characters:
U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR
The Unicode Character Database, also defines two general categories:
Zl = Separator, line
Zp = Separator, paragraph
The Zl category only contains the LINE SEPARATOR character and the Zp
category only contains the PARAGRAPH SEPARATOR character.
So, IMHO, the lexical syntax should be changed in :
newline -> return linefeed | return | linefeed | formfeed
| uniLine | uniPara
uniLine -> any Unicode character defined as line separator
uniPara -> any Unicode character defined as paragraph separator
or, alternatively:
uniLine -> LINE SEPARATOR
uniPara -> PARAGRAPH SEPARATOR
Manlio Perillo
More information about the Haskell-Cafe
mailing list