[Haskell] non-ASCII characters in Haddock documentation
Graham Klyne
GK at ninebynine.org
Mon Feb 16 15:21:11 EST 2004
At 11:29 16/02/04 +0000, Ross Paterson wrote:
>On Mon, Feb 16, 2004 at 10:20:30AM -0000, Simon Marlow wrote:
>...
> > It shouldn't be too hard to fix this, at least for Latin-1 (full
> > Unicode would be somewhat harder). I'll add it to the TODO list.
>
>While Haskell's source charset is specified as Unicode, Haskell source
>files don't specify the byte encoding they use, so any source file using
>non-ASCII characters isn't portable. Entrenching Latin-1 would make the
>move to Unicode more difficult.
Ah, yes. I was going to suggest that for generating XHTML, it should be
easy enough to generate &#xxxx; expansions, but that doesn't take account
of not knowing the input encoding. Maybe the XML conventions for encoding
designation (UTF-8, UTF-16 big-endian, UTF-16 little-endian) might be
applicable?
Also:
> > Defaulting to Latin-1 may be sensible, though?
>
>It may seem so to western europeans, but others may differ.
>A case could be made for UTF-8.
I tend to agree. Further, the choice of defaulting to Latin-1 seems a
strange one when much of the rest of the world (well, the networking world)
seems to be moving towards more universal character set encodings. For
example, URIs, XML, and UTF-8 has been the IETF "preferred option" since
early 1998:
[[
Protocols MUST be able to use the UTF-8 charset, which consists of
the ISO 10646 coded character set combined with the UTF-8
character encoding scheme, as defined in [10646] Annex R
(published in Amendment 2), for all text.
Protocols MAY specify, in addition, how to use other charsets or
other character encoding schemes for ISO 10646, such as UTF-16,
but lack of an ability to use UTF-8 is a violation of this policy;
such a violation would need a variance procedure ([BCP9] section
9) with clear and solid justification in the protocol
specification document before being entered into or advanced upon
the standards track.
For existing protocols or protocols that move data from existing
datastores, support of other charsets, or even using a default
other than UTF-8, may be a requirement. This is acceptable, but
UTF-8 support MUST be possible.
When using other charsets than UTF-8, these MUST be registered in
the IANA charset registry, if necessary by registering them when
the protocol is published.
]]
-- http://www.ietf.org/rfc/rfc2277.txt
#g
------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
More information about the Haskell
mailing list