[Haskell-cafe] bug in Prelude.words?

malcolm.wallace malcolm.wallace at me.com
Mon Mar 28 18:53:27 CEST 2011

I think it's predictable, isSpace (which words is based on) is based on generalCategory, which returns the proper Unicode category:

λ> generalCategory '\xa0'

I agree, and I also agree that it would make sense the other way (not breaking on non-breaking spaces).  Perhaps it would be a good idea to add a remark to the documentation which specifies the treatment of non-breaking spaces.

I note that Java has two distinct properties concerning whitespace:

Character.isSpaceChar('\xA0')  == True
Character.isWhitespace('\xA0') == False

Contrast with

 -- \x20 is ASCII space
Character.isSpaceChar('\x20')  == True
Character.isWhitespace('\x20') == True

 -- \x2060 is the word-joiner (zero-width non-breaking space)
Character.isSpaceChar('\x2060')  == False 
Character.isWhitespace('\x2060') == False

 -- \x202F is the narrow non-breaking space
Character.isSpaceChar('\x202F')  == True
Character.isWhitespace('\x202F') == False

  -- \x2009 is the thin space
Character.isSpaceChar('\x2009')  == True
CharacterisWhitespace('\x2009') == True

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110328/529313a1/attachment.htm>

More information about the Haskell-Cafe mailing list