[Haskell-cafe] Writing binary files?

Glynn Clements glynn.clements at virgin.net
Sun Sep 12 15:58:06 EDT 2004


Marcin 'Qrczak' Kowalczyk wrote:

> But the default encoding should
> come from the locale instead of being ISO-8859-1.

The problem with that is that, if the locale's encoding is UTF-8, a
lot of stuff is going to break (i.e. anything in ISO-8859-* which
isn't limited to the 7-bit ASCII subset).

The advantage of assuming ISO-8859-* is that the decoder can't fail;
every possible stream of bytes is valid. This isn't the case for
UTF-8. The advantage of ISO-8859-1 in particular is that it's trivial
to convert the string back into the bytes which were actually read.

The key problem with using the locale is that you frequently encounter
files which aren't in the locale's encoding, and for which the
encoding can't easily be deduced.

If you assume ISO-8859-*, you can at least read them in, manipulate
the contents (in any way that doesn't require interpreting any
non-ASCII characters), and write out the results. OTOH, if you assume
UTF-8 (e.g. because that happens to be the locale's encoding), the
decoder is likely to abort shortly after the first non-ASCII character
it finds (either that, or it will just silently drop characters).

-- 
Glynn Clements <glynn.clements at virgin.net>


More information about the Haskell-Cafe mailing list