[Haskell-cafe] How to reverse ghc encoding of command line arguments
Ben Franksen
ben.franksen at online.de
Sun Nov 16 13:42:28 UTC 2014
I have a question about how to reverse the text encoding as done by ghc and
the base library for stuff that comes from the command line or the
environment.
Assume the user's environment specifies a non-Unicode locale, e.g. some
latin encoding. In this case, the String we get from e.g.
System.Environment.getArgs does *not* contain the Unicode code points of the
characters the user has entered. Instead the input bytes are mapped one-to-
one to Char. This has probably been done for compatibility reasons, and I do
not want to discuss this choice here. Rather, I want to find out how I can
convert such a string back to some proper Unicode representation, for
instance in order to store the value in a file with a defined encoding such
as utf-8.
This should be done in a generic way, i.e. without making ad-hoc assumptions
about what the user's encoding might be.
There is the iconv package. However, it takes ByteString as input and output
and it also requires that I give it the encoding as input. How do I find out
which is this encoding? On the command line I could simply do
ben at sarun[1]: ~ > locale charmap
ISO-8859-1
Is there a Haskell function that does the equivalent or do I have to use
getEnv "LC_CTYPE", then parse the result?
Let's assume I get this to work, so now I have a String that represents the
user's encoding, such as "ISO-8859-1". Now, in order to use iconv, I have to
convert the string I got via getArgs into a ByteString. But to do that
properly, I would have to decode it according to the user's current locale,
which is exactly what I want to achieve in the first place.
How do I break this cycle?
Perhaps it is simpler to write our own getArgs/getEnv functions and directly
convert the data we get from the system to a proper (Unicode) String?
Any suggestions would be highly appreciated.
Cheers
Ben
--
"Make it so they have to reboot after every typo." -- Scott Adams
More information about the Haskell-Cafe
mailing list