Rewrite of Data.Char library?

Thu Oct 22 22:56:56 EDT 2009

Ahn, Ki Yung 쓴 글:
> In the #haskell IRC channel, we just had a discussion on Data.Char
> predicates such as isAlpha, isUpper, isLower.  The implementation of
> Data.Char is not Haskell 98 since Char specification in Haskell 98 only
> covers latin1.  However, current predicates are confusing and intuitive
> properties does not hold.  One example is this:
> 
> [17:53:32] <newsham> > let cs = [minBound..maxBound]; us = filter
> isUpper cs; ls = filter isLower cs in take 5 $ (map toUpper ls) \\ us
> [17:53:33] <lambdabot>   "\170\186\223I\312"
> 
> isLower '\170' == True  but you can't turn that into an uppercase
> letter.  isUpper '170' == '\170'.
> 
> I know that GHC team working on a rewrite of IO library for better
> Unicode support (I hope also includes better locale and charset
> support).  Along the line to the new IO library work, it would also be
> good to have some cleanup in the Data.Char as well.
> 
> Thanks,
> 
> Ahn, Ki Yung

Just a follow-up to add, and my suggestions.  Lowercase and Uppercase
problem seems not to be solvable, since in some languages like German sz
doesn't have a good definition for an uppercase letter.  So, my previous
posting wouldn't be a really big problem.

Another problem is that, in the Haskell 98 Report, isAlpha is defined as
isLower or isUpper.  This is different from the current implementation.
What isAlhpa is categorizing is all the "Letter" categories.

So, wouldn't it be better to keep isAlpha to follow the definition of
the Haskell 98 report, and just define a new predicate called isLetter
if needed?  That at least sounds more proper and the programmer can
easily guess that it would correspond to the Letter categories in the
Unicode.