[Haskell-cafe] Ready for testing: Unicode support for Handle I/O

Tue Feb 3 18:39:43 EST 2009

On Tue, Feb 03, 2009 at 10:56:13PM +0000, Duncan Coutts wrote:
> > > Thanks to suggestions from Duncan Coutts, it's possible to call
> > > hSetEncoding even on buffered read Handles, and the right thing
> > > happens.  So we can read from text streams that include multiple
> > > encodings, such as an HTTP response or email message, without having
> > > to turn buffering off (though there is a penalty for switching
> > > encodings on a buffered Handle, as the IO system has to do some
> > > re-decoding to figure out where it should start reading from again).
> > 
> > Sounds useful, but is this the bit that causes the 30% performance hit?
> 
> No. You only pay that penalty if you switch encoding. The standard case
> has no extra cost.

I'm confused.  I thought the standard case was conversion to the
system's local encoding?  How is that different than selecting the
same encoding manually?

There always has to be *some* conversion from a 32-bit Char to the
system's selection, right?

What exactly do we have to do to avoid the penalty?

> No, I think that's 30% for latin1. The cost is not really the character
> conversion but the copying from a byte buffer via iconv to a char
> buffer.

Don't we already have to copy between a byte buffer and a char buffer,
since read() and write() use a byte buffer?

> > 30% slower is a big deal, especially since we're not all that speedy now.
> 
> Bear in mind that's talking about the [Char] interface, and nobody using
> that is expecting great performance. We already have an API for getting

Yes, I know, but it's still the most convenient interface, and making
it suck more isn't cool -- though there are certainly big wins here.

-- John