System.IO.latin1 docs

Mon Dec 27 18:04:41 CET 2010

On Dec 25, 2010, at 7:34 AM, Wolfgang Jeltsch wrote:
> The documentation of hSetBinaryMode says:
> 
>    This has the same effect as calling hSetEncoding with latin1,
>    together with hSetNewlineMode with noNewlineTranslation.
> 
> It seems that this sentence is wrong.

It seems wrong to me in intent. When a handle is in "binary" mode, it shouldn't have any encoding. If things were different, I'd want to propose that doing String I/O to such handles should fail, and that you should only be able to use ByteString with them. But I suppose that isn't viable...

Of course, the first 256 Unicode code points do encode in ISO-8859-1 (latin1) to the numerically equivalent bytes. However, I'd strongly support changing the documentation to not reference "latin1". There is great confusion about the latin1 character encoding on the Internet, to the degree that HTML5 will mandate that it be "misinterpreted for compatibility" as Windows-1252[1]. I'm glad we don't make such mistakes, and that's why I don't think we should be "repurposing" latin1 for binary use.

Perhaps the right way is that there should be an encoding called "binary" (or "octet"?). Setting hSetBinaryMode to True would set hSetEncoding to binary, and vice-versa. Then this encoding could be defined to have the interesting behavior observed: writing the code-point value mod 256.

 - Mark

[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#character-encodings-0

Mark Lentczner
http://www.ozonehouse.com/mark/
IRC: mtnviewmark