Haskell 98 Report possible errors, part one

Lars Henrik Mathiesen thorinn@diku.dk
24 Jul 2001 12:54:05 -0000


> From: Dylan Thurston <dpt@math.harvard.edu>
> Date: Mon, 23 Jul 2001 19:57:54 -0400
> 
> On Mon, Jul 23, 2001 at 06:30:30AM -0700, Simon Peyton-Jones wrote:
> > Someone else, quoted by Simon, attribution elided by Dylan, wrote:
> > | 2.2. Identifiers can use small and large Unicode letters. 
> > | What about caseless scripts where letters are neither small 
> > | nor large? The description of module Char says: "For the 
> > | purposes of Haskell, any alphabetic character which is not 
> > | lower case is treated as upper case (Unicode actually has 
> > | three cases: upper, lower and title)". This suggests that the 
> > | only anomaly is that titlecase letters are considered 
> > | uppercase. But what is actually specified is that caseless 
> > | scripts can be used to write constructor names, but not to 
> > | variable names. I don't know how to solve this.
> > 
> > I am woefully ignorant of Unicode, and I have no idea what to do
> > about this one.  I therefore propose to do nothing on the grounds
> > that I might easily make matters worse.
> 
> In this case, what about requiring identifiers to start with an upper
> or lower case alphabetic character?

I'm not sure that makes things better. It just makes it impossible to
have identifiers in caseless scripts (some of which are alphabetic).

And whether you choose your upper or lower case alphabetic character
from Latin, Greek, Coptic, Cyrillic, Armenian, Georgian, or Deseret,
it will probably look silly in front of a variable name spelled in
Hangul.

What would make sense to me is to define that caseless letters
(Unicode class Lo) behave as lowercase, and to choose some easily
visible, culturally neutral, symbol as the official 'conid marker'.
Since the problem only arises on Unicode-capable systems, there should
be plenty of those to choose from, even outside Latin-1.

To fix Haskell 98, the least intrusive way might be to allow only
classes Ll, Lt, and Lu in identifiers, with Lt (titlecase) and Lu
counting as uppercase --- it looks like that may actually have been
the intention. And then add a note explaining that caseless scripts
can't be used because they weren't considered initially.

Lars Mathiesen (U of Copenhagen CS Dep) <thorinn@diku.dk> (Humour NOT marked)