Spam filtering

Ben Gamari ben at well-typed.com
Sat Apr 16 09:18:01 UTC 2016


Niklas Hambüchen <mail at nh2.me> writes:

> Hi Ben,
>
> Could we not have a captcha instead of a reject, to avoid false
> positives? That would require no training.
>
> Since I assume most Trac spammers are extremely unsophisticated, a
> simple hardcoded question like "What programming language is GC all
> about?" may be sufficient.
>
The CAPTCHAs being broken are the reason why this incident occurred.
I have added some more CAPTCHAs to try to dilute the pool of answers
that they already know, but they still seem to solve them easily
enough regardless. I can only imagine they have some sentient beings
sitting at computers solving CAPTCHAs.

I don't really feel like we can make the CAPTCHAs themselves any more
difficult without excluding real new users, which I really want to avoid.

Regardless, my goal here is to error on the side of less filtering, not
more, even if this does mean more manual maintenance. To this end, I've
configured the filters such that the probability of legitimate activity
being suppressed should be negligible,

 * I've been careful to only train the Bayes filter on obvious spam;
   I have tested it against various snippets from the wiki and mailing
   list and have yet to see it score anything legitimate with a spam
   likelihood > 5%.

 * Even if the Bayes filter does deem your content to be spammy enough
   to warrant further attention, you will merely be asked to solve a
   CAPTCHA. Posts will not be outright rejected unless it is quite clear
   that they are spam.

I am optimistic that the filtering will have negligible effect on
legitimate traffic. As a smoke test I managed to create a new account,
open a new ticket, and start a new Wiki page without even needing to
solve a CAPTCHA.

Cheers,

- Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 472 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20160416/4a85749f/attachment.sig>


More information about the ghc-devs mailing list