[Haskell-cafe] bug in Prelude.words?
malcolm.wallace
malcolm.wallace at me.com
Mon Mar 28 18:53:27 CEST 2011
I think it's predictable, isSpace (which words is based on) is based on generalCategory, which returns the proper Unicode category:
λ> generalCategory '\xa0'
Space
I agree, and I also agree that it would make sense the other way (not breaking on non-breaking spaces). Perhaps it would be a good idea to add a remark to the documentation which specifies the treatment of non-breaking spaces.
I note that Java has two distinct properties concerning whitespace:
Character.isSpaceChar('\xA0') == True
Character.isWhitespace('\xA0') == False
Contrast with
-- \x20 is ASCII space
Character.isSpaceChar('\x20') == True
Character.isWhitespace('\x20') == True
-- \x2060 is the word-joiner (zero-width non-breaking space)
Character.isSpaceChar('\x2060') == False
Character.isWhitespace('\x2060') == False
-- \x202F is the narrow non-breaking space
Character.isSpaceChar('\x202F') == True
Character.isWhitespace('\x202F') == False
-- \x2009 is the thin space
Character.isSpaceChar('\x2009') == True
CharacterisWhitespace('\x2009') == True
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110328/529313a1/attachment.htm>
More information about the Haskell-Cafe
mailing list