[Haskell-i18n] unicode notation \uhhhh implementation

Alastair Reid alastair@reid-consulting-uk.ltd.uk
15 Aug 2002 17:46:20 +0100


> I wasn't aware of that paragraph in the report until recently, and
> as far as I know none of the current Haskell implementations
> implement the '\uhhhh' escape sequences.

HBC implemented Unicode years ago.

  http://www.math.chalmers.se/~augustss/hbc/lexemes.html

> One reason to use this approach would be if there already existed a
> preprocessor to do the job - does anyone know of one? 

Can't be more than a few lines of Perl.  It's quite short in Haskell too:

  convert :: String -> String
  convert ('\\':'u':c1:c2:c3:c4:cs) 
    | isHex c1 && isHex c2 && isHex c3 && isHex c4 
    = chr (readHex [c1,c2,c3,c4]) : convert cs
    | otherwise                              -- not clear if this is 
    = error "Malformed unicode sequence"     -- allowed by the spec
  convert (c:cs) = c : convert cs
  convert [] = []

> If not, I think the paragraph could be deleted in favour of using
> appropriate encodings for source files (I'd planned to implement at
> least UTF-8 in GHC at some point).

I think it's fine to support unicode input files as well but don't see
any motivation not to implement the \uXXXX form as well.  Indeed, we
know that all machines that can support Haskell can handle ASCII but
I'll bet there's plenty of systems where unicode-format files are
awkward to manipulate.

--
Alastair Reid                 alastair@reid-consulting-uk.ltd.uk  
Reid Consulting (UK) Limited  http://www.reid-consulting-uk.ltd.uk/alastair/