DBCS encoding support on Windows

Max Bolingbroke batterseapower at hotmail.com
Wed Apr 24 22:03:57 CEST 2013


The algorithm in the new module (GHC.IO.Encoding.CodePage.API) is rather
intricate, so I've commented it quite thoroughly. The changes to other
modules are minimal: we simply now use a real code page encoding instead of
brokenly using latin1 when GHC doesn't have the code page built in, so
there isn't much of a change to document.

Max


On 24 April 2013 08:12, Simon Peyton-Jones <simonpj at microsoft.com> wrote:

>  Great stuff.   ****
>
> ** **
>
> One thing: have you left enough documentation in the code that, when
> someone comes along in 3 years time, they can understand the problem and
> how you have dealt with it?  Lot of “Note [Blah]” stuff?  Or something.***
> *
>
>
> Thanks****
>
> ** **
>
> Simon****
>
> ** **
>
> *From:* ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org]
> *On Behalf Of *Max Bolingbroke
> *Sent:* 23 April 2013 21:29
> *To:* ghc-devs at haskell.org
> *Subject:* DBCS encoding support on Windows****
>
> ** **
>
> Hi GHCers,****
>
> ** **
>
> I've implemented support in GHC for extra Windows code pages on the branch
> "dbcs" of the base library.****
>
> ** **
>
> The problem this solves is that currently users of Haskell on a Windows
> machine running in a locale which uses a double-byte code page such as
> CP936 (GBK) or CP950 (Big5) cannot properly interact with the Windows
> console in their native language. Unfortunately code page support is a
> prerequisite for getting this to work correctly because for all Microsoft's
> fine talk about Unicode being the future, the Windows console does not seem
> to support it properly - code pages are the only way to go for console
> input and output.****
>
> ** **
>
> As the standard Windows locale encodings in many regions, these code pages
> are also the predominant method of encoding text files in many countries,
> so they are useful outside the console.****
>
> ** **
>
> The solution is along the lines suggested in
> http://hackage.haskell.org/trac/ghc/ticket/3977, i.e. we create an
> iconv-like interface to Window's MultiByteToWideChar and
> WideCharToMultiByte APIs by the judicious use of binary search. In my
> branch, these APIs will be used whenever we don't have a built-in native
> Haskell TextEncoding for the code page (we used to fall back on using
> latin1 for such code pages).****
>
> ** **
>
> Unless there are any objections I'll merge this into the base library main
> branch next week.****
>
> ** **
>
> Cheers,****
>
> Max****
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130424/43793e24/attachment-0001.htm>


More information about the ghc-devs mailing list