Text in Haskell: A PROPOSAL

Ken Shan ken@digitas.harvard.edu
Thu, 8 Aug 2002 00:26:55 -0400


--fUYQa+Pmc3FrFX/N
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

A mailing list (for a working group) would be great.  haskell-i18n
perhaps?  Or haskell-unicode?  Who creates lists on haskell.org?  Should
I get one made at eecs.harvard.edu?  Should I switch to haskell-cafe for
now?

The following can probably wait until the list is created, but: What do
people think of the Character type class?

On 2002-08-07T15:21:11-0700, Ashley Yakeley wrote:
> No, GHC uses Char to mean a Unicode codepoint. These are not 32-bit. It=
=20
> only allows the 17 pages i.e. values in the range '\x0' to '\x10FFFF'.=20
> This is the Right Thing as per Unicode 3.1 and later (current is 3.2.0).

Ah, neat!  I missed it before; thanks.

> >  (c) Convert between 3 and 4, according to the Unicode standard.
> You mean according to UTF-16.

Yes.

> >  (1) Represent char in C as Char, and zero-terminated strings (char*)
> >      in C as CString.
> We already have the CChar type that means that.

One thing I am concerned about is that getChar should be IO CChar, not
IO Char.  So CChar needs to be part of the language standard (in fact,
the Prelude).  It would require quite a bit of modification for old code
to work -- old code that all along has pretty much used Char as a
synonym for Word8.

--=20
Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig
http://www.ethnologue.com/

--fUYQa+Pmc3FrFX/N
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE9UfMOzjAc4f+uuBURAklZAKD2GsYOLJvr3uWXd3MiYNM0OvxLcgCfW0g1
Vgtnk2R9qdm7d4OxreCxub4=
=47CW
-----END PGP SIGNATURE-----

--fUYQa+Pmc3FrFX/N--