[Haskell-cafe] How does GHC read UNICODE.
Ketil Malde
ketil at malde.org
Tue May 20 03:30:57 EDT 2008
Don Stewart <dons at galois.com> writes:
> You can use either bytestrings, which will ignore any encoding,
Uh, I am hesitant to voice my protest here, but I think this bears
some elaboration:
Bytestrings are exactly that, strings of bytes.
There are basically two interfaces, one (Data.ByteString[.Lazy]),
which operates on raw bytes (and gives you Word8s), and another
(Data.ByteString[.Lazy].Char8), which treats the contents as Chars.
The latter will only deal with Unicode code points 0..255 (or
ISO_8859-1) -- and truncate higher code point values to fit this
range.
Basically, bytestrings are the wrong tool for the job if you need more
than 8 bits per character. I think the predecessors of bytestring
(FPS?) had support for other fixed-size encodings, that is, two-byte
and four-byte characters. Perhaps writing a Data.Word16String
bytestrings-alike using UCS-2 would be an option?
-k
--
If I haven't seen further, it is by standing in the footprints of giants
More information about the Haskell-Cafe
mailing list