[Haskell-i18n] Unicode in source
Ketil Z. Malde
ketil@ii.uib.no
21 Aug 2002 12:53:44 +0200
"Simon Marlow" <simonmar@microsoft.com> writes:
>> for Unicode characters, with the first form being applicable to code
>> points below 64K, and the second to all of Unicode.
> There are several problems with using this kind of encoding in source
> files, as pointed out by Sven Moritz Hallberg (indentation, syntax
> ambiguities, etc.), so I'd prefer to stick to standard encodings such as
> UTF-8 for source files.
So, in essence, we remove the \uHHHH paragraph from 2.1 in the report?
I'm not sure it wouldn't be nice to have a way to specify Unicode
characters in identifiers, but if you propose to postpone it until and
if it becomes a problem, I have no problems with that.
Note that editors will probably display unknown characters as \NNNN or
similar escape codes, this will break (visible) layout anyway.
> (aside: aren't there problems with Unicode not being a fixed-width
> character set? Some characters are expected to combine with others to
> form a glyph, there are multiple versions of some characters with
> different widths, there are several widths of space, etc.)
I'm not familiar with all the nooks and crannies of Unicode, but I
would have thought that the width of characters is a feature of the
*font*, not the character set. So in a fixed-width font, each
character should have the same width, also things like "ff"-ligature,
"'n" and so on. Without a fixed-width font, layout becomes a bit
meaningless.
IIUC, for combining characters, where the code-point doesn't represent
a printable glyph but a modification of the preceeding one, this will
probably make a mess. Perhaps combining characters should be
disallowed?
I still maintain that if you use layout, "do", "of", "where", etc,
should be followed by a line break. This would, I think, solve most
layout problems, and even Ashley might be tempted to let go of braces
and semicolons. :-)
-kzm
--
If I haven't seen further, it is by standing in the footprints of giants