Improving Data.Char.isSpace performance

John MacFarlane jgm at berkeley.edu
Wed Oct 31 03:33:44 CET 2012


I've done some further investigating.  The large differences
I was seeing on OSX went away when I moved to a linux machine,
and also when I compiled the benchmark with -O3.  (When I do
either of those things, the puzzling difference between
isSpace_DataChar and Data.Char.isSpace also goes away.) I
don't know enough about core to figure out what is going on
here, but I'll assume that the benchmarks I'm getting on the linux
box are the sober ones.   They look like this (the number is
the ratio new code time / old code time for isSpace):

Benchmark compiled without optimization:
ascii text                0.71
ascii text (short lines)  0.74
ascii text (long lines)   0.72
Greek text                1.08
Haskell code              0.77
chars 0..255              0.70
all spaces                1.02

Benchmark compiled with -O2:
ascii text                0.69
ascii text (short lines)  0.72
ascii text (long lines)   0.69
Greek text                1.11
Haskell code              0.77
chars 0..255              0.69
all spaces                0.94

This suggests that we can get a modest improvement for the
most common cases if we adopt the new definition of isSpace.
However, performance might actually decrease slightly for
non-latin text.

The changes I tried for other functions in GHC.Unicode did
not result in significant improvements.

So, the question is whether it's worth submitting the patch for
isSpace, given that the gains are more modest than I'd reported
before.  (Note that 'words' will also be affected by this, as
it uses isSpace.)  I have attached the proposed patch to this
email.

John

+++ John MacFarlane [Oct 29 12 16:15 ]:
> +++ Simon Peyton-Jones [Oct 29 12 22:29 ]:
> > Sounds good to me.  Thanks for doing this.
> > 
> > When you think you are ready, just submit a patch.  (As others have noted, maybe isSpace isn't the only function that could benefit from this kind of attention.)
> 
> Yes, if the general idea is agreeable, I'll do some of the other
> functions in GHC.Unicode as well, and provide benchmarks for them
> as well.
> 
> 
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
-------------- next part --------------
A non-text attachment was scrubbed...
Name: isSpace.patch
Type: text/x-diff
Size: 1446 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/libraries/attachments/20121030/f3071665/attachment.patch>


More information about the Libraries mailing list