[Haskell] non-ASCII characters in Haddock documentation

Mon Feb 16 15:21:11 EST 2004

At 11:29 16/02/04 +0000, Ross Paterson wrote:
>On Mon, Feb 16, 2004 at 10:20:30AM -0000, Simon Marlow wrote:
>...
> > It shouldn't be too hard to fix this, at least for Latin-1 (full
> > Unicode would be somewhat harder).  I'll add it to the TODO list.
>
>While Haskell's source charset is specified as Unicode, Haskell source
>files don't specify the byte encoding they use, so any source file using
>non-ASCII characters isn't portable.  Entrenching Latin-1 would make the
>move to Unicode more difficult.

Ah, yes.  I was going to suggest that for generating XHTML, it should be 
easy enough to generate &#xxxx; expansions, but that doesn't take account 
of not knowing the input encoding.  Maybe the XML conventions for encoding 
designation (UTF-8, UTF-16 big-endian, UTF-16 little-endian) might be 
applicable?

Also:
> > Defaulting to Latin-1 may be sensible, though?
>
>It may seem so to western europeans, but others may differ.
>A case could be made for UTF-8.

I tend to agree.  Further, the choice of defaulting to Latin-1 seems a 
strange one when much of the rest of the world (well, the networking world) 
seems to be moving towards more universal character set encodings.  For 
example, URIs, XML, and UTF-8 has been the IETF "preferred option" since 
early 1998:

[[
     Protocols MUST be able to use the UTF-8 charset, which consists of
     the ISO 10646 coded character set combined with the UTF-8
     character encoding scheme, as defined in [10646] Annex R
     (published in Amendment 2), for all text.

     Protocols MAY specify, in addition, how to use other charsets or
     other character encoding schemes for ISO 10646, such as UTF-16,
     but lack of an ability to use UTF-8 is a violation of this policy;
     such a violation would need a variance procedure ([BCP9] section
     9) with clear and solid justification in the protocol
     specification document before being entered into or advanced upon
     the standards track.

     For existing protocols or protocols that move data from existing
     datastores, support of other charsets, or even using a default
     other than UTF-8, may be a requirement. This is acceptable, but
     UTF-8 support MUST be possible.

     When using other charsets than UTF-8, these MUST be registered in
     the IANA charset registry, if necessary by registering them when
     the protocol is published.
]]
-- http://www.ietf.org/rfc/rfc2277.txt

#g

------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact