[Haskell-cafe] How to reverse ghc encoding of command line arguments
Ben Franksen
ben.franksen at online.de
Mon Nov 17 23:41:57 UTC 2014
Donn Cave wrote:
> [... I said earlier ...]
>> I may be confused here - trying this out, I seem to be getting
>> garbage I don't understand from System.Environment getArgs.
>
> So I returned to this out of curiosity, and specifically,
> System.Environment getArgs converts common accented characters
> in ISO-8859-1 command line arguments, into values in the
> high 0xDC00's. Lower case umlaut u, for example, is 0xDCFC.
> These values, fed into Data.Text pack and encodeUtf8, seem
> to be garbage ... I get 3-byte UTF-8 that I highly doubt
> has anything to do with accented latin characters, actually
> the same "\239\191\189" even for different chars.
>
> But the lower bytes looked like Unicode values, and if the
> upper 0xDC00 is cleared, Data.Text pack and encodeUtf8 works.
>
> I'm no Unicode whiz, maybe this all makes sense? I'm not
> inconvenienced by this myself, my interest is only academic,
> just wondering what the extra 0xDC00 bits are for. And I
> should note that as far as I can make out, this doesn't match
> the remark at the beginning of this thread: "... does *not*
> contain the Unicode code points of the characters the user has
> entered. Instead the input bytes are mapped one-to-one to Char."
> I have GHC 7.8.3.
Hi Donn
I am sorry, I should have replied earlier here to say that I was *wrong*:
GHC/base does not by default do what I claimed it does, as I learned later
and you confirm now. It does that only if the program expressly demands it
by specifying a so-called "char8" encoding, by initializing the global
variable localeEncoding before the base library does it for you. With this
you can override the user's locale as seen by GHC/base. I was working on
Darcs and this is what Darcs does. But I was not aware of this hack and used
to local reasoning in Haskell (doesn't Haskell claim to be a purely
functional language?).
Sorry for the confusion. And thanks for confirming that GHC and the base
library do the right thing (if we let them).
Cheers
Ben
--
"Make it so they have to reboot after every typo." -- Scott Adams
More information about the Haskell-Cafe
mailing list