[GHC] #10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do

GHC ghc-devs at haskell.org
Tue Apr 10 19:25:12 UTC 2018


#10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do
-------------------------------------+-------------------------------------
        Reporter:  Artyom.Kazak      |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  libraries/base    |              Version:  7.10.1
      Resolution:                    |             Keywords:  unicode,
                                     |  newcomer
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by Azel):

 Looking a bit farther afield, all languages I see who have an `isAlphaNum`
 equivalent define it as returning `True` if either of their `isAlpha` or
 `isNumber` equivalents do (e.g.
 [https://docs.oracle.com/javase/9/docs/api/java/lang/Character.html
 #isLetterOrDigit-int- Java's], [http://msdn.microsoft.com/en-
 gb/library/cay4xx2f(v=vs.110).aspx the .NET Framework's],
 [http://www.lispworks.com/documentation/HyperSpec/Body/13_ade.htm Common
 Lisp's], [https://docs.python.org/3/library/stdtypes.html#str.isalnum
 Python's] — with the particularity in Python's documentation that they put
 three functions to match on numbers in `isalnum`'s description but the
 first two are subsumed by the third… — or [http://www.ada-
 auth.org/standards/12rm/html/RM-A-3-5.html Ada's]). So I'm willing to have
 a go at solving that ticket and would be in favour of fixing `u_iswalnum`
 and keeping the doc mostly as it is: it states that `isAlphaNum` selects
 alphabetic or numeric digit Unicode characters and currently, even if we
 remove the mark characters, it doesn't matches only that because it
 matches also `GENCAT_NO` and `GENCAT_NL`.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10412#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list