[Haskell-cafe] How to reverse ghc encoding of command line arguments

Carl Howells chowells79 at gmail.com
Sun Nov 16 18:38:00 UTC 2014


If the input bytes are mapped 1-1 to Char values without conversion,
you can just use Data.ByteString.Char8.pack to convert to a
ByteString, which you can then convert to Unicode however you like.

On Sun, Nov 16, 2014 at 5:42 AM, Ben Franksen <ben.franksen at online.de> wrote:
> I have a question about how to reverse the text encoding as done by ghc and
> the base library for stuff that comes from the command line or the
> environment.
>
> Assume the user's environment specifies a non-Unicode locale, e.g. some
> latin encoding. In this case, the String we get from e.g.
> System.Environment.getArgs does *not* contain the Unicode code points of the
> characters the user has entered. Instead the input bytes are mapped one-to-
> one to Char. This has probably been done for compatibility reasons, and I do
> not want to discuss this choice here. Rather, I want to find out how I can
> convert such a string back to some proper Unicode representation, for
> instance in order to store the value in a file with a defined encoding such
> as utf-8.
>
> This should be done in a generic way, i.e. without making ad-hoc assumptions
> about what the user's encoding might be.
>
> There is the iconv package. However, it takes ByteString as input and output
> and it also requires that I give it the encoding as input. How do I find out
> which is this encoding? On the command line I could simply do
>
> ben at sarun[1]: ~ > locale charmap
> ISO-8859-1
>
> Is there a Haskell function that does the equivalent or do I have to use
> getEnv "LC_CTYPE", then parse the result?
>
> Let's assume I get this to work, so now I have a String that represents the
> user's encoding, such as "ISO-8859-1". Now, in order to use iconv, I have to
> convert the string I got via getArgs into a ByteString. But to do that
> properly, I would have to decode it according to the user's current locale,
> which is exactly what I want to achieve in the first place.
>
> How do I break this cycle?
>
> Perhaps it is simpler to write our own getArgs/getEnv functions and directly
> convert the data we get from the system to a proper (Unicode) String?
>
> Any suggestions would be highly appreciated.
>
> Cheers
> Ben
> --
> "Make it so they have to reboot after every typo." -- Scott Adams
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe


More information about the Haskell-Cafe mailing list