isSpace is too slow

Mon May 21 04:43:16 EDT 2007

On Sun, 2007-05-20 at 16:59 +0100, Duncan Coutts wrote:

> isSpace c               =  c == ' '     ||
>                            c == '\t'    ||
>                            c == '\n'    ||
>                            c == '\r'    ||
>                            c == '\f'    ||
>                            c == '\v'    ||
>                            c == '\xa0'  ||
>                            iswspace (fromIntegral (ord c)) /= 0
> 
> iswspace does a generic lookup in the unicode property database I think.

So there's little hope of beating iswspace unless your input contains a
lot of spaces, I guess - for all non-space, we call iswspace, which
presumably repeats the tests for ASCII space.

Wouldn't something along these lines be more efficient?

isSpace :: Char -> Bool
isSpace = isSp . ord
isSp c | c <= 13    = c >= 8  -- \b..\r
       | c <= 127   = c == 32 -- ' '
       | c <= 255   = c == 0xa0 -- nbsp
       | otherwise      = iswspace(..)

A quick test shows about a factor two improvement
on /usr/share/dict/words, but that will of course only trig the first
match.

-k