[Haskell-i18n] unicode notation \uhhhh implementation
Simon Marlow
simonmar@microsoft.com
Fri, 16 Aug 2002 10:01:04 +0100
> > I wasn't aware of that paragraph in the report until recently, and
> > as far as I know none of the current Haskell implementations
> > implement the '\uhhhh' escape sequences.
>=20
> HBC implemented Unicode years ago.
>=20
> http://www.math.chalmers.se/~augustss/hbc/lexemes.html
No, HBC doesn't implement the paragraph of the report that we're talking =
about. HBC allows the '\uhhhh' escape sequence in characters and string =
literals, but not in identifiers and other parts of the source.
Also, it's not clear to me why you need '\uhhhh' escape sequence in =
character and string literals at all, since it appears to mean the same =
thing as '\xhhhh' (the report isn't clear that '\xhhhh' means a "unicode =
code point", but that seems to be the only reasonable interpretation).
> One reason to use this approach would be if there already existed a
> preprocessor to do the job - does anyone know of one?=20
> Can't be more than a few lines of Perl. It's quite short in Haskell =
too:
>=20
> convert :: String -> String
> convert ('\\':'u':c1:c2:c3:c4:cs)=20
> | isHex c1 && isHex c2 && isHex c3 && isHex c4=20
> =3D chr (readHex [c1,c2,c3,c4]) : convert cs
> | otherwise -- not clear if this is=20
> =3D error "Malformed unicode sequence" -- allowed by the spec
> convert (c:cs) =3D c : convert cs
> convert [] =3D []
I meant a preprocessor to take source code in some random encoding and =
convert it into ASCII with '\uhhhh' escape sequences. If there was such =
a thing, then we could all use it and save re-implementing N different =
encodings in each compiler.
Cheers,
Simon