Googlebot was over-crawling us recently - hundreds of thousands of hits daily, despite a relatively small number of public archives on our site. >From looking at the web server logs, it was clear that they were stuck in a some kind of loop, indexing some posts between two and six times a day. I did get Google support to decrease the crawl rate a bit, but they did not fix the loop. (My suspicion is that the large number of URL parameters for plaintext vs html, fixed width vs variable fonts, etc was confusing the issue, but I never did nail that down.) So I added a bunch of entries to our robots.txt file to disallow crawling of any archives > 1 month old. -- Andy Smith-Petersen System Administrator IT Network Services University of Southern Maine On Sun, 2007-05-27 at 19:22 -0500, Andrew Bosch wrote: > It's probably a Googlebot invoking the WA CGI. You will have better > success blocking access at your web server or firewall. > > > >>> <[log in to unmask]> 5/27/2007 7:12 PM >>> > Since at least midnight today , every few seconds we're seeing > > 27 May 2007 20:03:16 To [ANONYMOUS]@LISTSERV.SYR.EDU: ***LOGIN*** > 27 May 2007 20:03:17 From [ANONYMOUS]@LISTSERV.SYR.EDU: X-LOGCK > 14BF5837AF8379B229 AUTHINFO(66.249.67.57) ORGINFO(66.249.67.57) > > 66.249.67.57 is registered to Google. > > Any ideas what is going on? Any techniques I can use to shut this off? I > tried adding a filter *@66.249.67.57, but that doesn't seem to stop it. > > Thanks, > Nelson > > -- Syracuse University Listserv List Manager > -- Listserv webpage: http://listserv.syr.edu >