[Haskell-cafe] How to reverse ghc encoding of commandline arguments

Brandon Allbery allbery.b at gmail.com
Wed Nov 19 11:49:01 UTC 2014


On Wed, Nov 19, 2014 at 7:56 AM, Donn Cave <donn at avvanta.com> wrote:

> quoth Donn Cave <donn at avvanta.com>
> ...
> > Umlaut u turns up as 0xFC for UTF-8 users;  0xDCFC, for Latin-1 users.
> > This is an ordinary hello world type program, can't think of any
> > unique environmental issues.
>
> Well, I mischaracterized that problem, so to speak.
>
> I find that GHC is not picking up on my "current locale" encoding,
> and instead seems to be hard-wired to UTF-8.  On MacOS X, I can
> select an encoding in Terminal Preferences, open a new window, and
> for all intents and purposes it's an ISO8859-1 world, including
> LANG=en_US.ISO8859-1, but GHC isn't going along with it.
>
> So the ISO8859-1 umlaut u is undecodable if GHC is stuck in UTF-8,
> which seems to explain what I'm seeing.  If I understand this right,
> the 0xDC00 high byte is recognized in some circumstances, and the
> value is spared from UTF-8 encoding and instead simply copied.
>

ISO8859 is not multibyte. And your earlier description is incorrect, in a
way showing a common confusion about the relationship between Unicode and
UTF8 and ISO8859-1.

U+00FC is the Unicode codepoint for u-umlaut. This is, by design, the same
as the single byte sequence for u-umlaut (0xFC) in ISO8859-1. It is *not*
the UTF8 representation of u-umlaut; that is 0xC3 0xBC.

The 0xDC prefix is, as I said earlier, a hack used by ghc. Internally it
only uses UTF8; so a non-UTF8 value which it needs to roundtrip from its
external representation, which per POSIX has no encoding / is an octet
string, to its internal representation is encoded as if it were UTF8 with a
0xDC prefix (stolen; that range belongs to Syriac) and then decoded back to
the non-UTF8 external form by stripping the prefix. But this means that you
will find yourself working with a "strange" Unicode codepoint.

-- 
brandon s allbery kf8nh                               sine nomine associates
allbery.b at gmail.com                                  ballbery at sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20141119/6e4d4f26/attachment.html>


More information about the Haskell-Cafe mailing list