Unicode + Re: Reading/Writing Binary Data in Haskell
George Russell
ger@tzi.de
Mon, 14 Jul 2003 11:53:28 +0200
Martin quoted Glynn:
> OTOH, existing implementations (at least GHC and Hugs) currently read
> and write "8-bit binary", i.e. characters 0-255 get read and written
> "as-is" and anything else breaks, and changing that would probably
> break a fair amount of existing code.
The binary library I posted to the libraries list:
http://haskell.org/pipermail/libraries/2003-June/001227.html
which is for GHC, does this properly. All characters are encoded
using a standard encoding for unsigned integers, which uses the
bottom 7 bits of each character as data, and the top bit to signal
that the encoding is not yet complete. Characters 0-127 (which
include the standard ASCII ones) get encoded as themselves.
This is probably not nearly as efficient as encoding characters
as themselves, but it's nice to be Unicode-proof ...