[Haskell-i18n] Unicode in source

Simon Marlow simonmar@microsoft.com
Wed, 21 Aug 2002 11:02:07 +0100


> Apparently, this isn't quite supported by GHC:
>=20
>         Prelude> map Char.ord "\74\749\7490"
>         [74,237,66]
>=20
> which is, of course, the values modulo 256.

I think you've found a bug.  It works with a single character:

  Prelude> Char.ord '\xffff'
  65535

but not with a string.  Thanks for the report :)

> Anyway, if the report is corrected to not limit us to 16 bits, this at
> least gives us enough mechanism to use Unicode in string and
> character constants.=20
>=20
> What about using it in identifiers?  I suggest the following formats:
>=20
>         #hhhh
> and     ##hhhhhhhh
>=20
> for Unicode characters, with the first form being applicable to code
> points below 64K, and the second to all of Unicode.

There are several problems with using this kind of encoding in source
files, as pointed out by Sven Moritz Hallberg (indentation, syntax
ambiguities, etc.), so I'd prefer to stick to standard encodings such as
UTF-8 for source files.  At least that way you'll be able to get an
editor that will display the file as it is indented to be.

(aside: aren't there problems with Unicode not being a fixed-width
character set?  Some characters are expected to combine with others to
form a glyph, there are multiple versions of some characters with
different widths, there are several widths of space, etc.)

Cheers,
	Simon