[Haskell-cafe] Has character changed in GHC 6.8?
ketil+haskell at ii.uib.no
Wed Jan 23 05:58:54 EST 2008
Peter Verswyvelen <bf3 at telenet.be> writes:
> No I just used wrong terminology. When I said unicode, I actually meant UCS-x,
You might as well say UCS-4, nobody uses UCS-2 anymore. It's been
replaced by UTF-16, which gives you the complexity of UTF-8 without
being compact (for 99% of existing data), endianness-indifferent, or backwards
compatibe with ASCII.
> and with multi-byte-string-thing I meant VARIABLE-length, sorry about that. I
> find variable length chars so much harder to use and reason about than the
> fixed length characters. UTF-x is a form of compression, which is
> understandable, but it is IMHO a burden (since it does not allow random access
> to the n-th character)
Do you really need that, though? Most formats I know with enough structure
that you can pick up records by offset either encode the offsets
somewhere, or are restricted to ASCII, or both.
> Now I'm getting a bit confused here. To summarize, what encoding does GHC 6.8.2
> use for [Char]? UCS-32?
Internally, Haskell Chars are Unicode, and stores a code point as a
32bit (well, actually 21 bit or something) value. One Char, one code
ByteString stores 8-bit "char"s, and the Char8 interface chops off the
top bits, essentially projecting codepoints down to the ISO-8859-1
Externally, it depends on what IO library you use.
As for the command line, Ian's post links to:
If I haven't seen further, it is by standing in the footprints of giants
More information about the Haskell-Cafe