[Haskell-cafe] How to use Unicode strings?
wren ng thornton
wren at freegeek.org
Sun Nov 23 20:32:28 EST 2008
Alexey Khudyakov wrote:
> But this bring question what "the right thing" is? If locale is UTF8 or system
> support unicode some other way - no problem, just encode string properly.
> Problem is how to deal with untanslatable characters. Skip? Replace with
> question marks? Anything other? Probably we need to look how this is
> solved in other languages. (Or not solved)
Regarding untranslatable characters, I think the only correct thing to
do is consider it exceptional behavior and have the conversion function
accept a handler function which takes the character as input and
produces a string for it. That way programs can define their own
behavior, since this is something that doesn't have a "right" way to
recover in all cases. Canonical handlers which skip, replace with
question marks (or other arbitrary character), throw actual exceptions,
etc could be provided for convenience.
For stream-based "strings" a al ByteString, dealing with this sort of a
handler in an efficient manner is fairly straightforward (though some
CPS tricks may be needed to get rid of the Maybe in the result of the
basic converter). For [Char] strings efficiency is harder, but the
implementation should still be easy (given the basic converter).
Most extant languages I've seen tend to pick a single solution for all
cases, but I don't think we should follow along that path.
More information about the Haskell-Cafe