[Haskell-cafe] How to reverse ghc encoding of command line arguments

Ben Franksen ben.franksen at online.de
Sun Nov 16 19:01:24 UTC 2014


Carl Howells wrote:
> If the input bytes are mapped 1-1 to Char values without conversion,
> you can just use Data.ByteString.Char8.pack to convert to a
> ByteString, which you can then convert to Unicode however you like.

Yes, but I cannot be sure this is the case, it depends on the user's locale 
encoding.

Cheers
Ben

> On Sun, Nov 16, 2014 at 5:42 AM, Ben Franksen <ben.franksen at online.de>
> wrote:
>> I have a question about how to reverse the text encoding as done by ghc
>> and the base library for stuff that comes from the command line or the
>> environment.
>>
>> Assume the user's environment specifies a non-Unicode locale, e.g. some
>> latin encoding. In this case, the String we get from e.g.
>> System.Environment.getArgs does *not* contain the Unicode code points of
>> the characters the user has entered. Instead the input bytes are mapped
>> one-to- one to Char. This has probably been done for compatibility
>> reasons, and I do not want to discuss this choice here. Rather, I want to
>> find out how I can convert such a string back to some proper Unicode
>> representation, for instance in order to store the value in a file with a
>> defined encoding such as utf-8.
>>
>> This should be done in a generic way, i.e. without making ad-hoc
>> assumptions about what the user's encoding might be.
>>
>> There is the iconv package. However, it takes ByteString as input and
>> output and it also requires that I give it the encoding as input. How do
>> I find out which is this encoding? On the command line I could simply do
>>
>> ben at sarun[1]: ~ > locale charmap
>> ISO-8859-1
>>
>> Is there a Haskell function that does the equivalent or do I have to use
>> getEnv "LC_CTYPE", then parse the result?
>>
>> Let's assume I get this to work, so now I have a String that represents
>> the user's encoding, such as "ISO-8859-1". Now, in order to use iconv, I
>> have to convert the string I got via getArgs into a ByteString. But to do
>> that properly, I would have to decode it according to the user's current
>> locale, which is exactly what I want to achieve in the first place.
>>
>> How do I break this cycle?
>>
>> Perhaps it is simpler to write our own getArgs/getEnv functions and
>> directly convert the data we get from the system to a proper (Unicode)
>> String?
>>
>> Any suggestions would be highly appreciated.
>>
>> Cheers
>> Ben
>> --
>> "Make it so they have to reboot after every typo." -- Scott Adams
>>
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
-- 
"Make it so they have to reboot after every typo." -- Scott Adams




More information about the Haskell-Cafe mailing list