[web-devel] Keep too busy crawlers away from your server

Henning Thielemann lemming at henning-thielemann.de
Tue Mar 1 19:54:33 UTC 2016


I know there are more sophisticated tools for that task, but I wrote a 
pretty simple program that works for me reliably for several months now. 
See the attached program. It scans the last 100 KB of a log-file you 
specify as command-line argument and checks whether certain clients excess 
allowedAccessesPerSecond. If so, their IPs are written to a blocked-hosts 
file and are removed from there only 5 days later. The blocked-hosts file 
in my example is used by arno-iptables-firewall to actually block the IP 
addresses. I added this program to my crontab:

# m h  dom mon dow   command
* * * * *      process-log /var/log/http-access.log >>/var/log/http-block.log 2>>/var/log/http-block.err
-------------- next part --------------
A non-text attachment was scrubbed...
Name: process-log.hs
Type: text/x-haskell
Size: 5276 bytes
Desc: 
URL: <http://mail.haskell.org/pipermail/web-devel/attachments/20160301/7db372bb/attachment.hs>


More information about the web-devel mailing list