Rewrite of Data.Char library?
Ian Lynagh
igloo at earth.li
Sun Oct 25 15:04:03 EDT 2009
On Thu, Oct 22, 2009 at 07:56:56PM -0700, Ahn, Ki Yung wrote:
> Ahn, Ki Yung 쓴 글:
> > In the #haskell IRC channel, we just had a discussion on Data.Char
> > predicates such as isAlpha, isUpper, isLower. The implementation of
> > Data.Char is not Haskell 98 since Char specification in Haskell 98 only
> > covers latin1.
Char in Haskell98 covers Unicode too;
http://haskell.org/onlinereport/char.html says:
Function toUpper converts a letter to the corresponding upper-case
letter, leaving any other character unchanged. Any Unicode letter
which has an upper-case equivalent is transformed. Similarly,
toLower converts a letter to the corresponding lower-case letter,
leaving any other character unchanged.
> > However, current predicates are confusing and intuitive
> > properties does not hold. One example is this:
> >
> > [17:53:32] <newsham> > let cs = [minBound..maxBound]; us = filter
> > isUpper cs; ls = filter isLower cs in take 5 $ (map toUpper ls) \\ us
> > [17:53:33] <lambdabot> "\170\186\223I\312"
> >
> > isLower '\170' == True but you can't turn that into an uppercase
> > letter. isUpper '170' == '\170'.
What behaviour would you expect?
> Another problem is that, in the Haskell 98 Report, isAlpha is defined as
> isLower or isUpper. This is different from the current implementation.
> What isAlhpa is categorizing is all the "Letter" categories.
Right, we have:
isLower = "Letter, Lowercase"
isUpper = "Letter, Uppercase" or "Letter, Titlecase"
isAlpha = "Letter, Lowercase" or
"Letter, Uppercase" or "Letter, Titlecase" or
"Letter, Modifier" or "Letter, Other"
The report says:
any alphabetic character which is not lower case is treated as upper
case (Unicode actually has three cases: upper, lower, and title"
and defines:
isAlpha c = isUpper c || isLower c
so the implementation is not consistent with the language definition. I
wouldn't like to say which is "wrong", though (but I would guess "both"
:-) I think it would be great if someone were to design a new interface
that provided something closer to the Unicode spec, perhaps in
Data.Char.Unicode; we could make the current interface a layer on top).
> So, wouldn't it be better to keep isAlpha to follow the definition of
> the Haskell 98 report, and just define a new predicate called isLetter
> if needed?
If your idea is to improve the handling of '\170' then this won't help.
'\170' is "Letter, Lowercase".
Thanks
Ian
More information about the Libraries
mailing list