[Haskell-cafe] Ready for testing: Unicode support for Handle I/O

Ross Paterson ross at soi.city.ac.uk
Wed Feb 4 08:52:36 EST 2009


On Wed, Feb 04, 2009 at 01:46:16PM +0000, Duncan Coutts wrote:
> On Wed, 2009-02-04 at 13:31 +0000, Simon Marlow wrote:
> > Yes; the utf16 and utf32 encodings accept a BOM (and generate a BOM in 
> > write mode).  This caused interesting bugs when doing re-decoding after 
> > switching encodings, because the BOM constitutes state in the decoder, 
> > which means that decoding is not necessarily repeatable unless you save the 
> > state (which iconv doesn't provide a way to do).
> > 
> > Are there other encodings that have this kind of state?  If so, then they 
> > might be restricted to NoBuffering at least when switching encodings.
> 
> Yes, I believe there are some Asian encodings that are stateful.

Yes, ISO-2022 encodings, like ISO-2022-JP, have a number of states and
switch between them with escape sequences.  They're a nightmare.


More information about the Libraries mailing list