[Haskell-i18n] Unicode in source

Sven Moritz Hallberg pesco@gmx.de
21 Aug 2002 12:42:06 +0200


On Wed, 2002-08-21 at 12:02, Simon Marlow wrote:
> 
> > Apparently, this isn't quite supported by GHC:
> > 
> >         Prelude> map Char.ord "\74\749\7490"
> >         [74,237,66]
> > 
> > which is, of course, the values modulo 256.
> 
> I think you've found a bug. [...]

Oh, oops. :)


> (aside: aren't there problems with Unicode not being a fixed-width
> character set?  Some characters are expected to combine with others to
> form a glyph, there are multiple versions of some characters with
> different widths, there are several widths of space, etc.)

I think (...) these issues should not pose a problem.

variable-width characters:
Unicode specifically doesn't say anything about the glyph representation
of the characters. So it is reasonable to assume there will be
fixed-width unicode character sets. Remember that even our latin
alphabet has characters of different width (i vs. w) which we just
somehow manage to fit into glyphs of the same width. If one's editor
would really use a variable-width font he'll already have the problem
with ASCII.

composition characters:
I think we should interpret each character in the source as exactly one
and leave any possible composition to the level of editing tools. The
way I imagine the use of these composition characters is, for instance,
as keyboard input to an editor which then composes them into a single
char before writing anything to a file. I'd say this issue belongs to
the domain of text processing.


Regards,
Sven Moritz