DBCS encoding support on Windows

Max Bolingbroke batterseapower at hotmail.com
Tue Apr 23 22:29:29 CEST 2013


Hi GHCers,

I've implemented support in GHC for extra Windows code pages on the branch
"dbcs" of the base library.

The problem this solves is that currently users of Haskell on a Windows
machine running in a locale which uses a double-byte code page such as
CP936 (GBK) or CP950 (Big5) cannot properly interact with the Windows
console in their native language. Unfortunately code page support is a
prerequisite for getting this to work correctly because for all Microsoft's
fine talk about Unicode being the future, the Windows console does not seem
to support it properly - code pages are the only way to go for console
input and output.

As the standard Windows locale encodings in many regions, these code pages
are also the predominant method of encoding text files in many countries,
so they are useful outside the console.

The solution is along the lines suggested in
http://hackage.haskell.org/trac/ghc/ticket/3977, i.e. we create an
iconv-like interface to Window's MultiByteToWideChar and
WideCharToMultiByte APIs by the judicious use of binary search. In my
branch, these APIs will be used whenever we don't have a built-in native
Haskell TextEncoding for the code page (we used to fall back on using
latin1 for such code pages).

Unless there are any objections I'll merge this into the base library main
branch next week.

Cheers,
Max
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130423/32a6aea6/attachment.htm>


More information about the ghc-devs mailing list