[Haskell-cafe] How to use Unicode strings?

wren ng thornton wren at freegeek.org
Sun Nov 23 20:32:28 EST 2008


Alexey Khudyakov wrote:
> But this bring question what "the right thing" is? If locale is UTF8 or system
> support unicode some other way - no problem, just encode string properly.
> Problem is how to deal with untanslatable characters. Skip? Replace with
> question marks? Anything other? Probably we need to look how this is
> solved in other languages. (Or not solved)

Regarding untranslatable characters, I think the only correct thing to 
do is consider it exceptional behavior and have the conversion function 
accept a handler function which takes the character as input and 
produces a string for it. That way programs can define their own 
behavior, since this is something that doesn't have a "right" way to 
recover in all cases. Canonical handlers which skip, replace with 
question marks (or other arbitrary character), throw actual exceptions, 
etc could be provided for convenience.

For stream-based "strings" a al ByteString, dealing with this sort of a 
handler in an efficient manner is fairly straightforward (though some 
CPS tricks may be needed to get rid of the Maybe in the result of the 
basic converter). For [Char] strings efficiency is harder, but the 
implementation should still be easy (given the basic converter).

Most extant languages I've seen tend to pick a single solution for all 
cases, but I don't think we should follow along that path.

-- 
Live well,
~wren


More information about the Haskell-Cafe mailing list