GHC and UNICODE...

Fri Dec 19 12:17:42 EST 2003

On Fri, Dec 19, 2003 at 04:51:50PM +0000, MR K P SCHUPKE wrote:
> Whilst I appreciate the topic of show is not directly related to GHC,
> what I would like to know is how to handle UNICODE properly... If I assume
> I have a good unicode terminal, so stdin and stdout are in unicode format,
> and all my text files are in unicode, how do I deal with this properly in
> GHC... what is the current state of affairs?

I use unicode quite regularly with GHC. The only way in which unicode is
lacking, is in the default I/O implementation in the standard libraries.
There is nothing about the compiler itself that inherently limits
unicode support. All that is needed is to replace or augment the
standard IO code, a task which is often necisarry for other reasons when
working on large projects anyway.

here are a number of things I have done to make unicode easier to deal
with:
1. written the CWString library (now a part of the FFI) which lets you
call arbitrary C functions doing all the proper character set conversion
stuff.
2. used UTF8.hs to wrap the various routines in IO. works great as long
as your system uses utf8. (which many do)
3. modified daan's PPrint to be able to handle arbitrary character
widths independent of the number of characters. this is useful when
encoding things in a charset which doesn't have a 1-1 character to
screen cell guarentee. (accents, CJK languages, etc..) and is also
incidentlly very useful for doing things like embedding arbitrary escape
sequences (colors) into pretty printed layout without affecting the PP
algorithm.

something I have wanted to do is modify Alex so that ∀ turns into the
regular expression 0xe2 0x88 0x80 (and so forth) so that ghc (whose
lexer is generated from alex) can simply accept utf8 input. 
        John

-- 
---------------------------------------------------------------------------
John Meacham - California Institute of Technology, Alum. - john at foo.net
---------------------------------------------------------------------------