[Haskell-cafe] Ready for testing: Unicode support for Handle I/O
Duncan Coutts
duncan.coutts at worc.ox.ac.uk
Wed Feb 4 08:46:16 EST 2009
On Wed, 2009-02-04 at 13:31 +0000, Simon Marlow wrote:
> Duncan Coutts wrote:
> > On Tue, 2009-02-03 at 11:03 -0600, John Goerzen wrote:
> >
> >> Will there also be something to handle the UTF-16 BOM marker? I'm not
> >> sure what the best API for that is, since it may or may not be present,
> >> but it should be considered -- and could perhaps help autodetect encoding.
> >
> > I think someone else mentioned this already, but utf16 (as opposed to
> > utf16be/le) will use the BOM if its present.
> >
> > I'm not quite sure what happens when you switch encoding, presumably
> > it'll accept and consider a BOM at that point.
>
> Yes; the utf16 and utf32 encodings accept a BOM (and generate a BOM in
> write mode). This caused interesting bugs when doing re-decoding after
> switching encodings, because the BOM constitutes state in the decoder,
> which means that decoding is not necessarily repeatable unless you save the
> state (which iconv doesn't provide a way to do).
>
> Are there other encodings that have this kind of state? If so, then they
> might be restricted to NoBuffering at least when switching encodings.
Yes, I believe there are some Asian encodings that are stateful.
Duncan
More information about the Haskell-Cafe
mailing list