converting capital letters into small letters
that jefu guy
jefu.jefu@verizon.net
26 Jul 2002 02:58:25 -0700
On Thu, 2002-07-25 at 19:07, Andrew J Bromage wrote:
> G'day all.
>
> On Fri, Jul 26, 2002 at 01:27:48AM +0000, Karen Y wrote:
>
> > 1. How would I convert capital letters into small letters?
> > 2. How would I remove vowels from a string?
>
> As you've probably found out, these are very hard problems.
> Glossing over that concern, current implementations don't support the
> relevant UnicodePrims fully, so to do it properly you'll probably need
> to parse the case folding files yourself. See:
>
> http://www.unicode.org/unicode/reports/tr21/
>
> Vowels are even harder because I don't think the Unicode standard even
> defines what a "vowel" is. Removing vowel _marks_ should be
> straightforward once you expand combining characters, but that doesn't
> help with the general case. Frankly, I don't like your chances.
Shouldn't the solution also take care of languages without upper casing?
Clearly the translation problem is easy enough with such languages (
"id" will work just fine), but determining (from context?) that the
string is in such a language is more than a bit difficult (especially
given that numeric codes can correspond to most everything).
Vowels are much more difficult - even given that the language is
recognizable, what would happen with languages such as Chinese or Arabic
which (I believe) have nothing that even resembles a vowel?
Of course, Chinese is a whole problem by itself.
--
jeff putnam -- jefu.jefu@verizon.net -- http://home1.get.net/res0tm0p