Output character encoding for ghc on OpenBSD
Matthias Kilian
kili at outback.escape.de
Sun Apr 18 14:22:52 EDT 2010
Hi,
On Sun, Apr 18, 2010 at 10:53:22AM -0700, Judah Jacobson wrote:
> > Anyway, the short story is that I have to either hard-code the
> > character set to something like utf-8, or ghc will start to behave
> > really strange (for example, ghci would terminate immediately if
> > you just *type* a non-ASCII character).
>
> That sounds like it might be something to do with the haskeline
> package, which ghci uses for user interaction. Haskeline makes its
> own FFI calls to translate raw input bytes into Unicode Chars.
Oh, this may indeed be a second problem. However, the encoding
problem itself also manifests in the `openTempFile001' test of the
testsuite. For example, with an unpatched ghc-6.12, the test fails
with the following output:
=====> openTempFile001(normal) 1048 of 2375 [0, 38, 0]
cd ./lib/IO && '/usr/obj/ports/ghc-6.12.2/ghc-6.12.2/inplace/bin/ghc-stage2' -fforce-recomp -dcore-lint -dcmm-lint -no-user-package-conf -dno-debug-output -o openTempFile001 openTempFile001.hs >openTempFil
e001.comp.stderr 2>&1
cd ./lib/IO && ./openTempFile001 </dev/null >openTempFile001.run.stdout 2>openTempFile001.run.stderr
Wrong exit code (expected 0 , actual 1 )
Stdout:
Stderr:
openTempFile001: ./test22236.txt: hClose: invalid argument (Illegal byte sequence)
*** unexpected failure for openTempFile001(normal)
> Can
> you elaborate further on what exactly the issue is with OpenBSD's
> locale support? In particular, there's several components used by
> Haskeline:
> - call set_locale(LC_CTYPE)
Problem number 1: set_locale(LC_CTYPE) fails (i.e. returns NULL)
for any locale except `C` or `POSIX'. Did I mention that OpenBSD
is really bad with locales? ;-)
> - call nl_langinfo(CODESET)
Always returns `646' (ASCII). Duh.
> - pass the resulting string (which should be, e.g., $LANG) to iconv_open
iconv_open appears to need the *codeset* name, not a complete locale.
Note that OpenBSD uses GNU libiconv-1.13, which AFAIK differs from
the one included in glibc. Even worse, I have to pass something
like "UTF-8", whereas "UTF8" doesn't work.
> - call iconv on user input (which may be malformed)
I wrote a little C program that does the following (some error
checks omitted here):
char *inp, &outp;
size_t insz, outsz;
unsigned char in[] = {0xa9, 0, 0, 0};
char out[512];
inp = in;
outp = out;
insz = sizeof(in);
outsz = sizeof(out) - 1;
setlocale(LC_CTYPE, "");
ic = iconv_open("", "UTF-32LE");
if (iconv(ic, &inp, &insz, &outp, &outsz) == -1) {
... bail out (perror() etc.) ...
}
iconv_close(ic);
*outp = 0;
puts(out);
And it just doesn't work, regardless what I set LC_CTYPE to. The
only way to get it printing the copyright symbol is to explicitely
use "UTF-8" (or "ISO-8859-1" or something else that knows about
that symbol) as the first argument to iconv_open().
> Is the problem that setting $LC_ALL or $LANG has no effect on the
> string returned by nl_langinfo, so the translation fails?
Yes, see above.
> If so,
> haskeline is supposed to output "?"s in that case, so there might be a
> bug in the package.
It fails (or rather: ghci fails, since I didn't yet do any separate
haskeline tests) with the same error as the test mentioned above,
with the difference that it fails on hPutChar instead of hClose for
obvious reasons.
> Finally, when you say you have to "hard-code the character set", are
> you talking about ghc, haskeline, the base library, or somewhere else?
I'm talking about libraries/base/GHC/IO/Encoding/Iconv.hs
See? There just is no non-hackerish way to fix this (except of
course improving locale support on OpenBSD, but that's beyond my
scope currently).
Ciao,
Kili
More information about the Glasgow-haskell-users
mailing list