[web-devel] Keep too busy crawlers away from your server
Henning Thielemann
lemming at henning-thielemann.de
Tue Mar 1 19:54:33 UTC 2016
I know there are more sophisticated tools for that task, but I wrote a
pretty simple program that works for me reliably for several months now.
See the attached program. It scans the last 100 KB of a log-file you
specify as command-line argument and checks whether certain clients excess
allowedAccessesPerSecond. If so, their IPs are written to a blocked-hosts
file and are removed from there only 5 days later. The blocked-hosts file
in my example is used by arno-iptables-firewall to actually block the IP
addresses. I added this program to my crontab:
# m h dom mon dow command
* * * * * process-log /var/log/http-access.log >>/var/log/http-block.log 2>>/var/log/http-block.err
-------------- next part --------------
A non-text attachment was scrubbed...
Name: process-log.hs
Type: text/x-haskell
Size: 5276 bytes
Desc:
URL: <http://mail.haskell.org/pipermail/web-devel/attachments/20160301/7db372bb/attachment.hs>
More information about the web-devel
mailing list