[Haskell-i18n] Unicode in source

21 Aug 2002 12:53:44 +0200

"Simon Marlow" <simonmar@microsoft.com> writes:

>> for Unicode characters, with the first form being applicable to code
>> points below 64K, and the second to all of Unicode.

> There are several problems with using this kind of encoding in source
> files, as pointed out by Sven Moritz Hallberg (indentation, syntax
> ambiguities, etc.), so I'd prefer to stick to standard encodings such as
> UTF-8 for source files.  

So, in essence, we remove the \uHHHH paragraph from 2.1 in the report?

I'm not sure it wouldn't be nice to have a way to specify Unicode
characters in identifiers, but if you propose to postpone it until and
if it becomes a problem, I have no problems with that.

Note that editors will probably display unknown characters as \NNNN or
similar escape codes, this will break (visible) layout anyway.

> (aside: aren't there problems with Unicode not being a fixed-width
> character set?  Some characters are expected to combine with others to
> form a glyph, there are multiple versions of some characters with
> different widths, there are several widths of space, etc.)

I'm not familiar with all the nooks and crannies of Unicode, but I
would have thought that the width of characters is a feature of the
*font*, not the character set.  So in a fixed-width font, each
character should have the same width, also things like "ff"-ligature,
"'n" and so on.  Without a fixed-width font, layout becomes a bit
meaningless. 

IIUC, for combining characters, where the code-point doesn't represent
a printable glyph but a modification of the preceeding one, this will
probably make a mess.  Perhaps combining characters should be
disallowed?

I still maintain that if you use layout, "do", "of", "where", etc,
should be followed by a line break.  This would, I think, solve most
layout problems, and even Ashley might be tempted to let go of braces
and semicolons. :-)

-kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants