[Haskell-cafe] Lexical Syntax and Unicode

Manlio Perillo manlio_perillo at libero.it
Sat Nov 14 05:31:20 EST 2009


Hi.

Reading the Haskell 98 Report (section 9.2), I have found a possible
problem.

The lexical syntax supports Unicode, however this is not true for the
newline:

newline -> return linefeed | return | linefeed | formfeed


The Unicode standard adds two additional characters:

U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR

The Unicode Character Database, also defines two general categories:
Zl = Separator, line
Zp = Separator, paragraph

The Zl category only contains the LINE SEPARATOR character and the Zp
category only contains the PARAGRAPH SEPARATOR character.


So, IMHO, the lexical syntax should be changed in :

newline -> return linefeed | return | linefeed | formfeed
           | uniLine | uniPara
uniLine -> any Unicode character defined as line separator
uniPara -> any Unicode character defined as paragraph separator

or, alternatively:

uniLine -> LINE SEPARATOR
uniPara -> PARAGRAPH SEPARATOR



Manlio Perillo


More information about the Haskell-Cafe mailing list