Unicode + Re: Reading/Writing Binary Data in Haskell

George Russell ger@tzi.de
Mon, 14 Jul 2003 11:53:28 +0200


Martin quoted Glynn:
 > OTOH, existing implementations (at least GHC and Hugs) currently read
 > and write "8-bit binary", i.e. characters 0-255 get read and written
 > "as-is" and anything else breaks, and changing that would probably
 > break a fair amount of existing code.

The binary library I posted to the libraries list:

    http://haskell.org/pipermail/libraries/2003-June/001227.html

which is for GHC, does this properly.  All characters are encoded
using a standard encoding for unsigned integers, which uses the
bottom 7 bits of each character as data, and the top bit to signal
that the encoding is not yet complete.  Characters 0-127 (which
include the standard ASCII ones) get encoded as themselves.

This is probably not nearly as efficient as encoding characters
as themselves, but it's nice to be Unicode-proof ...