[Haskell-cafe] haskell wiki indexing

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Tue May 22 10:05:48 EDT 2007


On Tue, 2007-05-22 at 14:40 +0100, Claus Reinke wrote:

> so the situation for mailing lists and online docs seems to have
> improved, but there is still the wiki indexing/rogue bot issue,
> and lots of fine tuning (together with watching the logs to spot
> any issues arising out of relaxing those restrictions). perhaps
> someone on this list would be willing to volunteer to look into
> those robots/indexing issues on haskell.org?-)

The main problem, and the reason for the original (temporary!) measure
was bots indexing all possible diffs between old versions of wiki pages.
URLs like:

http://haskell.org/haskellwiki/?title=Quicksort&diff=9608&oldid=9607

For pages with long histories this O(n^2) number of requests starts to
get quite large and the wiki engine does not seem well optimised for
getting arbitrary diffs. So we ended up with bots holding open many http
server connections. They were not actually causing much server cpu load
or generating much traffic but once the number of nearly hung
connections got up to the http child process limit then we are
effectively in a DOS situation.

So if we can ban bots from the page histories or turn them off for the
bot user agents or something then we might have a cure. Perhaps we just
need to upgrade our media wiki software or find out how other sites
using this software deal with the same issue of bots reading page
histories.

Duncan



More information about the Haskell-Cafe mailing list