[Haskell-cafe] Re: Hugs vs GHC (again) was: Re: Some random newbiequestions

Aaron Denney wnoise at ofb.net
Sat Jan 8 03:08:38 EST 2005


On 2005-01-07, Simon Marlow <simonmar at microsoft.com> wrote:
>  - Can you use (some encoding of) Unicode for your Haskell source files?
>    I don't think this is true in any Haskell compiler right now.

I assume this won't be be done until the next one is done...

>  - Can you do String I/O in some encoding of Unicode?  No Haskell
>    compiler has support for this yet, and there are design decisions
>    to be made.  Some progress has been made on an experimental prototype
>    (see recent discussion on this list).

Many of the easy ways to do this that I've heard proposed make the
current hacks for binary IO fail.  IMHO, we really, really, need a
standard, supported way to do binary IO.  If I can read in and output
octets, then I can implement unicode handling on top of that.  In fact
it would let a bunch of the proposed ideas for unicode support can
be implemented in pure haskell and have API details hashed out and
polished.

For unix, there are couple different tacks one could take.  The locale
system is standard, and does work, but is ugly and a pain to work with.
In particular, it's another (set of) global variables.  And what do you
do with a character not expressible in the current locale?

I'd like to possibility of different character sets for different files,
for example.

I suppose I wouldn't be too upset at using the locale information, but
defaulting to UTF-8, rather than ASCII for unset character set
information.

For win32, I really don't know the options.

>  - What about Unicode FilePaths?  This was discussed a few months ago
>    on the haskell(-cafe) list, no support yet in any compiler.

This is tricky, because most systems don't have such a thing terribly
standard.

For win32, it is standardized and should be wrappable fairly easily, but
I don't know that I'd want to base my model on that.

For unix, again, there is the locale system, with, again, the problem
of unrepresentable characters.  Traditionally systems have essentially
said "file names are zero-terminated strings of bytes that may not
contain character 47, which is used to seperate directory names", and
the interpretation as a matter of _names_ and _characters_ was entirely
a matter up to the terminals (or graphical programs, eventually) for
display and programs for manipulation.  

-- 
Aaron Denney
-><-



More information about the Haskell-Cafe mailing list