[Haskell-cafe] Ready for testing: Unicode support for Handle I/O
Simon Marlow
marlowsd at gmail.com
Wed Feb 4 08:31:20 EST 2009
Duncan Coutts wrote:
> On Tue, 2009-02-03 at 11:03 -0600, John Goerzen wrote:
>
>> Will there also be something to handle the UTF-16 BOM marker? I'm not
>> sure what the best API for that is, since it may or may not be present,
>> but it should be considered -- and could perhaps help autodetect encoding.
>
> I think someone else mentioned this already, but utf16 (as opposed to
> utf16be/le) will use the BOM if its present.
>
> I'm not quite sure what happens when you switch encoding, presumably
> it'll accept and consider a BOM at that point.
Yes; the utf16 and utf32 encodings accept a BOM (and generate a BOM in
write mode). This caused interesting bugs when doing re-decoding after
switching encodings, because the BOM constitutes state in the decoder,
which means that decoding is not necessarily repeatable unless you save the
state (which iconv doesn't provide a way to do).
Are there other encodings that have this kind of state? If so, then they
might be restricted to NoBuffering at least when switching encodings.
Cheers,
Simon
More information about the Haskell-Cafe
mailing list