[Haskell-cafe] bug in Prelude.words?

Christopher Done chrisdone at googlemail.com
Mon Mar 28 18:05:47 CEST 2011


On 28 March 2011 17:55, malcolm.wallace <malcolm.wallace at me.com> wrote:

> Does anyone else think it odd that Prelude.words will break a string at a
> non-breaking space?
>
> Prelude> words "abc def\xA0ghi"
> ["abc","def","ghi"]
>

I think it's predictable, isSpace (which words is based on) is based on
generalCategory, which returns the proper Unicode category:

λ> generalCategory '\xa0'
Space

So:

-- | Selects white-space characters in the Latin-1 range.-- (In
Unicode terms, this includes spaces and some control
characters.)isSpace                 :: Char -> Bool-- isSpace includes
non-breaking space-- Done with explicit equalities both for
efficiency, and to avoid a tiresome-- recursion with GHC.List
elemisSpace c               =  c == ' '     ||
  c == '\t'    ||                           c == '\n'    ||
               c == '\r'    ||                           c == '\f'
||                           c == '\v'    ||
c == '\xa0'  ||                           iswspace (fromIntegral (ord
c)) /= 0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110328/9d73f9f1/attachment.htm>


More information about the Haskell-Cafe mailing list