On Unicode

Thu Apr 13 01:53:59 EDT 2006

I write to you because of this:

"Revert to US-ASCII, Latin-1 or implementation-defined
character sets." (from
http://hackage.haskell.org/trac/haskell-prime/wiki/UnicodeInHaskellSource)

The benefits of being allowed to used Unicode in
sourcecode include:

- Being able to define operators and identifiers
exactly matching the standard definitions in a domain
such as mathematics have been proven over many years
in tools like Mathematica. Using shortcuts and
palettes with such symbols and operators in whatever
editor makes entry easy.

- Allow non-english students to focus on logic rather
that on translating concepts into English. While
students may have a rudimentary grasp of English,
special terminology may be unknown to them.

- Interoperate with Microsoft .Net and by extension
Novell Mono where identifiers etc are allowed to be
Unicode. (I'm not familiar with Sun Java but the same
seem to be the case there).

- While using English can be of benefit for
international projects, there are many tasks where
this is not a factor such as adhoc research
calculations, in teaching and projects where all the
programmers are non-English. Using Unicode does not
prevent specific projects from choosing to restrict
themselves to only ASCII characters, the reverse is
not true.

Typical arguments against Unicode are:

- English is 'better'. This seems to be propagated by
people who only understand English and who has never
been exposed to other cultures and thus feel
threatened when confronted with something they do not
understand. However it is generally better to be
allowed to think and write in one's native tongue.

- It is not portable. This can be solved by
standardization as discussed elsewhere in the wiki
article.

- It may result in obfuscated code. True if misused,
however using ASCII does not prevent anyone from
obfuscating by naming variables a1, a2, a3, ... or
whatever else. Common sense makes this claim null and
void. Naming a variable correctly in one's own
language is preferable to an incorrect translation
attempt into English.

- It takes up more space. True but hardly relevant
except in degenerate cases because of the ever
increasing availability of memory. Even resource
constrained embedded devices such as Windows CE
exclusively support Unicode.

- It is not common practice. Practically all leading
user applications provide multi-language versions,
some Office products even has multi-language scripting
built-in. I can recall when it was common practice to
use punch cards with their 80 characters per line
restriction, it is my understanding that Haskell is
leading edge rather than legacy entrenched.

It is sad to see that most Haskell compilers and
interpreters does not implement the Haskell language
with regard to Unicode (perhaps except JHC but I
haven't succeeded in getting it to compile on Windows
yet - my own fault no doubt). It is even more sad to
see that this affects applications such as Pivotal,
which would be a truly wonderful teaching and research
aid with Unicode support and user-provided
translations into major languages.

My experience is from South-East Asia where written
languages are markedly different from English and
European languages. The alfabets are different, for
example in Khmer there is no space between words,
spaces are used somewhat like commas in English, even
the symbol for full stop "." is different. The symbols
for numbers are different (though English numbers are
widely understood) and in Thailand the official
calendar is Thai-Buddhist (current year 2549) not
Christian. These aspects are supported in Microsoft
.Net language applications consequently Microsoft .Net
is being widely taught here now! I understand the
situation to be much the same in Arabic countries and
then there is China and Japan...

Personally I would like to see things taken one step
further by having compilers support translation tables
for keywords, system defined function names, and even
external user defined identifiers, so automatic
translation of source code would be possible, but that
would propably be outside the scope of Haskell'.

As for myself I will continue to use Mathematica where
I can define my operators and symbols exactly like
they are in the literature which I consider to be in
the spirit of the literate programming style.

I do understand that all this does not really matter
if Haskell is only a research language for new
programming language concepts. But I hope this is
enough of argumentation to make it understandable that
while it is easier to implement a compiler in ASCII
only there are benefits to using Unicode for
non-English speakers also when programming.

Best regards, freegoldbar

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com