[Haskell-cafe] Unicode: Hugs vs GHC (again) was: Re: Some random newbie questions

Dimitry Golubovsky dimitry at golubovsky.org
Fri Jan 7 08:01:17 EST 2005


Hi,

Lennart Augustsson wrote:
> Simon Marlow wrote:
> 
>> Here's a summary of the state of Unicode support in GHC and other
>> compilers.  There are several aspects:
>>
>>  - Can the Char type hold the full range of Unicode characters?
>>    This has been true in GHC for some time, and is now true in Hugs.
>>    I don't think it's true in nhc98 (please correct me if I'm wrong).

I remember, it was in GHC. But any attempt to output Unicode characters 
using standard I/O functions always ended up outputting only low 8 bits. 
Has anything changed since then?

>>
>>  - Do the character class functions (isUpper, isAlpha etc.) work
>>    correctly on the full range of Unicode characters?  This is true in
>>    Hugs.  It's true with GHC on some systems (basically we were lazy
>>    and used the underlying C library's support here, which is patchy).

Which basically means that one with older or underconfigured system 
where they do not have permissions/technical possibilities to configure 
locales in the C library properly is out of luck...

>>
>>  - Can you use (some encoding of) Unicode for your Haskell source files?
>>    I don't think this is true in any Haskell compiler right now.

Well, Hugs from CVS accepts source code in UTF-8 (I am not sure about 
locale-based conversion) - at least on my computer. Another thing, 
string literals may be in UTF-8 encoding, but Hugs would not accept 
function/type identifiers in Unicode (i. e. one could not name a type or 
a function in Russian for instance - their names muct be ASCII).

I put an example of such a file in UTF-8 on my web-server:

http://www.golubovsky.org/software/hugs-patch/testutf.hs

> 
> Well, even if hbc is mostly dead I must point out that it has supported
> this since Unicode was first added to Haskell.  As well as the point
> above, of course.
> If the GHC implementors feel lazy they can always borrow the Unicode
> (plane 0) description table from HBC.  It is a 64k file.

Or in Hugs, there is a shell script (awk indeed, just wrapped in a shell 
script) which parses the Unicode data file and produces a C file (also 
about 64k), and compact set of primitive functions independent from C 
library - src/unix/mkunitable and part of src/char.c in the Hugs source 
tree respectively.

The reason I asked this question was: I am trying to understand, where 
is internationalization of Haskell compilers on their developers' list 
of priorities, and also how high is demand from users to have at least 
basic internationalization.

Dimitry Golubovsky
Middletown, CT





More information about the Haskell-Cafe mailing list