[Haskell-cafe] Ready for testing: Unicode support for Handle I/O
John Goerzen
jgoerzen at complete.org
Tue Feb 3 18:39:43 EST 2009
On Tue, Feb 03, 2009 at 10:56:13PM +0000, Duncan Coutts wrote:
> > > Thanks to suggestions from Duncan Coutts, it's possible to call
> > > hSetEncoding even on buffered read Handles, and the right thing
> > > happens. So we can read from text streams that include multiple
> > > encodings, such as an HTTP response or email message, without having
> > > to turn buffering off (though there is a penalty for switching
> > > encodings on a buffered Handle, as the IO system has to do some
> > > re-decoding to figure out where it should start reading from again).
> >
> > Sounds useful, but is this the bit that causes the 30% performance hit?
>
> No. You only pay that penalty if you switch encoding. The standard case
> has no extra cost.
I'm confused. I thought the standard case was conversion to the
system's local encoding? How is that different than selecting the
same encoding manually?
There always has to be *some* conversion from a 32-bit Char to the
system's selection, right?
What exactly do we have to do to avoid the penalty?
> No, I think that's 30% for latin1. The cost is not really the character
> conversion but the copying from a byte buffer via iconv to a char
> buffer.
Don't we already have to copy between a byte buffer and a char buffer,
since read() and write() use a byte buffer?
> > 30% slower is a big deal, especially since we're not all that speedy now.
>
> Bear in mind that's talking about the [Char] interface, and nobody using
> that is expecting great performance. We already have an API for getting
Yes, I know, but it's still the most convenient interface, and making
it suck more isn't cool -- though there are certainly big wins here.
-- John
More information about the Glasgow-haskell-users
mailing list