Performance of `count`ing a char in a ByteString

Georg Rudoy 0xd34df00d at gmail.com
Sat Feb 8 23:35:03 UTC 2020


Totally forgot about that, thanks for pointing out! Manual dispatch it
is then — a bit less pleasant, but not too bad.

сб, 8 февр. 2020 г., 18:09 Vanessa McHale <vamchale at gmail.com>:
>
> clang is the default C compiler on Mac. It would be sensible to at least support clang as well as gcc
>
> Cheers,
> Vanessa McHale
>
> > On Feb 8, 2020, at 4:34 PM, Georg Rudoy <0xd34df00d at gmail.com> wrote:
> >
> > Hi there,
> >
> > tl;dr: I've noticed ByteString's `count`'s performance can be improved
> > by a factor between 2.5 and 8 by explicitly using certain SIMD
> > extensions, so what's the policy on those?
> >
> > Results of a small C benchmark (since the corresponding BS routine is
> > implemented in C anyway) are available here (
> > https://github.com/0xd34df00d/counting-chars ), as well as some sample
> > code to give the idea of what's happening (which might be
> > incorrect/suboptimal, my SSE/AVX is rusty). I'll also put a sample
> > input file if I manage to get git-lfs going.
> >
> > This change only really makes sense for large strings, but that's a
> > somewhat common use case at least in my applications, so I wanted to
> > gather some initial feedback, as well as ask a couple of questions and
> > emphasize a couple of points.
> >
> > 1. How are SIMD extensions generally controlled?
> >
> > 2. Is depending on gcc OK? If so, we could leverage gcc's support of
> > automatically generating a dispatching function that'll check cpuid
> > for the supported extensions and call the right implementation.
> >
> > 3. Does BS' allocator give any guarantees on the alignment? It'd be
> > nice to get rid of that "prefix" part that makes sure the SIMD code
> > churns through 64-byte-aligned data.
> >
> > 4. Note that compiling with `-O3 -march=native` makes code effectively
> > non-portable (at least, to CPUs lacking extensions the compiler
> > decides to use), while the code with `-O2`, explicit SIMD versions and
> > a dispatching function will get the best performance using the SIMD
> > extensions available, degrading gracefully on older hardware. So,
> > personally, when I look at that table with the results I compare the
> > SSE4.2/AVX2 implementations against the `-O2` column.
> >
> > 5. Thanks Gregory Collins for mentioning PCMPESTRM somewhere — that
> > remark sparked the desire to do some intrinsics stuff.
> >
> > --
> >  Georg Rudoy
> > _______________________________________________
> > Libraries mailing list
> > Libraries at haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>


More information about the Libraries mailing list