[Haskell-cafe] How to reverse ghc encoding of command line arguments

Ben Franksen ben.franksen at online.de
Sun Nov 16 13:42:28 UTC 2014


I have a question about how to reverse the text encoding as done by ghc and 
the base library for stuff that comes from the command line or the 
environment.

Assume the user's environment specifies a non-Unicode locale, e.g. some 
latin encoding. In this case, the String we get from e.g. 
System.Environment.getArgs does *not* contain the Unicode code points of the 
characters the user has entered. Instead the input bytes are mapped one-to-
one to Char. This has probably been done for compatibility reasons, and I do 
not want to discuss this choice here. Rather, I want to find out how I can 
convert such a string back to some proper Unicode representation, for 
instance in order to store the value in a file with a defined encoding such 
as utf-8.

This should be done in a generic way, i.e. without making ad-hoc assumptions 
about what the user's encoding might be.

There is the iconv package. However, it takes ByteString as input and output 
and it also requires that I give it the encoding as input. How do I find out 
which is this encoding? On the command line I could simply do

ben at sarun[1]: ~ > locale charmap
ISO-8859-1

Is there a Haskell function that does the equivalent or do I have to use 
getEnv "LC_CTYPE", then parse the result?

Let's assume I get this to work, so now I have a String that represents the 
user's encoding, such as "ISO-8859-1". Now, in order to use iconv, I have to 
convert the string I got via getArgs into a ByteString. But to do that 
properly, I would have to decode it according to the user's current locale, 
which is exactly what I want to achieve in the first place.

How do I break this cycle?

Perhaps it is simpler to write our own getArgs/getEnv functions and directly 
convert the data we get from the system to a proper (Unicode) String?

Any suggestions would be highly appreciated.

Cheers
Ben
-- 
"Make it so they have to reboot after every typo." -- Scott Adams




More information about the Haskell-Cafe mailing list