that jefu guy
26 Jul 2002 02:58:25 -0700

> As you've probably found out, these are very hard problems.

> Glossing over that concern, current implementations don't support the
> relevant UnicodePrims fully, so to do it properly you'll probably need
> to parse the case folding files yourself.  See:
> Vowels are even harder because I don't think the Unicode standard even
> defines what a "vowel" is.  Removing vowel _marks_ should be
> straightforward once you expand combining characters, but that doesn't
> help with the general case.  Frankly, I don't like your chances.

Shouldn't the solution also take care of languages without upper casing?
Clearly the translation problem is easy enough with such languages (
"id" will work just fine), but determining (from context?) that the
string is in such a language is more than a bit difficult (especially
given that numeric codes can correspond to most everything).  

Vowels are much more difficult - even  given that the language is
recognizable, what would happen with languages such as Chinese or Arabic
which (I believe) have nothing that even resembles a vowel? 

Of course, Chinese is a whole problem by itself. 

