[Haskell-i18n] unicode notation \uhhhh implementation

Simon Marlow simonmar@microsoft.com
Fri, 16 Aug 2002 10:01:04 +0100


> > I wasn't aware of that paragraph in the report until recently, and
> > as far as I know none of the current Haskell implementations
> > implement the '\uhhhh' escape sequences.
>=20
> HBC implemented Unicode years ago.
>=20
>  http://www.math.chalmers.se/~augustss/hbc/lexemes.html

No, HBC doesn't implement the paragraph of the report that we're talking =
about.  HBC allows the '\uhhhh' escape sequence in characters and string =
literals, but not in identifiers and other parts of the source.

Also, it's not clear to me why you need '\uhhhh' escape sequence in =
character and string literals at all, since it appears to mean the same =
thing as '\xhhhh' (the report isn't clear that '\xhhhh' means a "unicode =
code point", but that seems to be the only reasonable interpretation).

> One reason to use this approach would be if there already existed a
> preprocessor to do the job - does anyone know of one?=20

> Can't be more than a few lines of Perl.  It's quite short in Haskell =
too:
>=20
>   convert :: String -> String
>   convert ('\\':'u':c1:c2:c3:c4:cs)=20
>     | isHex c1 && isHex c2 && isHex c3 && isHex c4=20
>     =3D chr (readHex [c1,c2,c3,c4]) : convert cs
>     | otherwise                              -- not clear if this is=20
>     =3D error "Malformed unicode sequence"     -- allowed by the spec
>   convert (c:cs) =3D c : convert cs
>   convert [] =3D []

I meant a preprocessor to take source code in some random encoding and =
convert it into ASCII with '\uhhhh' escape sequences.  If there was such =
a thing, then we could all use it and save re-implementing N different =
encodings in each compiler.

Cheers,
	Simon