Unicode windows console output.

David Sankel camior at gmail.com
Thu Nov 4 14:47:53 EDT 2010


On Thu, Nov 4, 2010 at 6:09 AM, Simon Marlow <marlowsd at gmail.com> wrote:

> On 04/11/2010 02:35, David Sankel wrote:
>
>> On Wed, Nov 3, 2010 at 9:00 AM, Simon Marlow <marlowsd at gmail.com
>> <mailto:marlowsd at gmail.com>> wrote:
>>
>>    On 03/11/2010 10:36, Bulat Ziganshin wrote:
>>
>>        Hello Max,
>>
>>        Wednesday, November 3, 2010, 1:26:50 PM, you wrote:
>>
>>            1. You need to use "chcp 65001" to set the console code page
>>            to UTF8
>>            2. It is very likely that your Windows console won't have
>>            the fonts
>>            required to actually make sense of the output. Pipe the
>>            output to
>>            foo.txt. If you open this file in notepad you will see the
>>            correct
>>            characters show up.
>>
>>
>>        it will work even without chcp. afaik nor ghc nor windows
>>        adjusts text
>>        being output to current console codepage
>>
>>
>>    GHC certainly does.  We use GetConsoleCP() when deciding what code
>>    page to use by default - see
>> libraries/base/GHC/IO/Encoding/CodePage.hs.
>>
>>
>>
>> This can actually be quite helpful. I've discovered that if you have a
>> console set to code page 65001 (UTF-8) and use WriteConsoleA (the
>> non-wide version) with UTF-8 encoded strings, the console displays the
>> text properly!
>>
>> So the solution seems to be, when outputting to a utf8 console use
>> WriteConsoleA.
>>
>
> We need someone to rewrite the IO library backend for Win32.  Currently it
> is going via the msvcrt POSIX emulation layer, i.e. using write() and
> pseudo-file-descriptors.  More than a few problems have been caused by this,
> and it's totally unnecessary except that we get to share some code between
> the POSIX and Windows backends.  We ought to be using the native Win32 APIs
> and HANDLE directly, then we could use WriteConsoleA.
>

It looks like replacing the POSIX layer isn't necessary to fix the Unicode
console output bug. I've made a ticket and in a comment I illustrate the
_setmode call that magically makes everything work:

http://hackage.haskell.org/trac/ghc/ticket/4471

I could attempt a ghc patch for this, but I don't have any experience with
the ghc code. Perhaps someone could add this _setmode call with relative
ease?

David

-- 
David Sankel
Sankel Software
www.sankelsoftware.com
585 617 4748 (Office)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20101104/9be051cd/attachment.html


More information about the Glasgow-haskell-users mailing list