[Haskell-cafe] Ready for testing: Unicode support for Handle I/O

Tue Feb 3 22:49:10 EST 2009

Duncan Coutts wrote:
> Sorry, I think we've been talking at cross purposes.

I think so.

>> There always has to be *some* conversion from a 32-bit Char to the
>> system's selection, right?
> 
> Yes. In text mode there is always some conversion going on. Internally
> there is a byte buffer and a char buffer (ie UTF32).
> 
>> What exactly do we have to do to avoid the penalty?
> 
> The penalty we're talking about here is not the cost of converting bytes
> to characters, it's in switching which encoding the Handle is using. For
> example you might read some HTTP headers in ASCII and then switch the
> Handle encoding to UTF8 to read some XML.

Simon referenced a 30% penalty.  Are you saying that if we read from a
Handle using the same encoding that we used when we first opened it,
that we won't see any slowdown vs. the system in 6.10?

> Switching the Handle encoding has a penalty. We have to discard the
> characters that we pre-decoded and re-decode the byte buffer in the new
> encoding. It's actually slightly more complicated because we do not

Got it.  That makes sense, as does the decision to optimize for the more
common (not switching the encoding) case.

-- John