Output character encoding for ghc on OpenBSD
judah.jacobson at gmail.com
Sun Apr 18 13:53:22 EDT 2010
On Sun, Apr 18, 2010 at 7:01 AM, Matthias Kilian <kili at outback.escape.de> wrote:
> as some of you may know, I'm working on an update of OpenBSDs ghc
> port to 6.12.2, currently chasing down the last remaining testsuite
> failures. Yesterday, I ran into a problem which I have a fix for,
> but only a really ugly fix, and I need some opinions of what users
> would prefer.
> The problem is that Haskell uses unicode characters internally (ghc
> itself uses UTF-32 internally, where the endianess depends on the
> architecture it's running on), and that any Haskell program (including
> ghc and ghci) has to convert between the internal representation
> and the actual locale settings of the system it's running on.
> Unfortunately, OpenBSD is really bad if it comes to locale support;
> the only supported locales are the C and the POSIX locales, so even
> if you set LC_ALL or LC_CTYPE to something like, for example,
> de_DE.iso88591, this would have no effect on OpenBSD.
> Anyway, the short story is that I have to either hard-code the
> character set to something like utf-8, or ghc will start to behave
> really strange (for example, ghci would terminate immediately if
> you just *type* a non-ASCII character).
That sounds like it might be something to do with the haskeline
package, which ghci uses for user interaction. Haskeline makes its
own FFI calls to translate raw input bytes into Unicode Chars. Can
you elaborate further on what exactly the issue is with OpenBSD's
locale support? In particular, there's several components used by
- call set_locale(LC_CTYPE)
- call nl_langinfo(CODESET)
- pass the resulting string (which should be, e.g., $LANG) to iconv_open
- call iconv on user input (which may be malformed)
Is the problem that setting $LC_ALL or $LANG has no effect on the
string returned by nl_langinfo, so the translation fails? If so,
haskeline is supposed to output "?"s in that case, so there might be a
bug in the package.
Finally, when you say you have to "hard-code the character set", are
you talking about ghc, haskeline, the base library, or somewhere else?
More information about the Glasgow-haskell-users