Tue, 29 May 2007 13:49:48 -0400
|
I very much appreciate all of the help I've been receiving. We are trying
the robots.txt option, but it doesn't seem to be working. Here's what
we've tried- perhaps we're doing it wrong. We put the following in
inetsrv.wwwroot
User-agent: googlebot
Disallow: /
Questions:
am I doing the right thing?
Will this stop the messages in the Listserv logs, or will they continue
to show?
Thanks again for any and all help.
Nelson
On Tue, 29 May 2007, Andy Smith-Petersen wrote:
>Googlebot was over-crawling us recently - hundreds of thousands of hits
>daily, despite a relatively small number of public archives on our site.
>>From looking at the web server logs, it was clear that they were stuck
>in a some kind of loop, indexing some posts between two and six times a
>day. I did get Google support to decrease the crawl rate a bit, but they
>did not fix the loop. (My suspicion is that the large number of URL
>parameters for plaintext vs html, fixed width vs variable fonts, etc was
>confusing the issue, but I never did nail that down.)
>
>So I added a bunch of entries to our robots.txt file to disallow
>crawling of any archives > 1 month old.
>
>--
>Andy Smith-Petersen
>System Administrator
>IT Network Services
>University of Southern Maine
>
>
>On Sun, 2007-05-27 at 19:22 -0500, Andrew Bosch wrote:
>> It's probably a Googlebot invoking the WA CGI. You will have better
>> success blocking access at your web server or firewall.
>>
>>
>> >>> <[log in to unmask]> 5/27/2007 7:12 PM >>>
>> Since at least midnight today , every few seconds we're seeing
>>
>> 27 May 2007 20:03:16 To [ANONYMOUS]@LISTSERV.SYR.EDU: ***LOGIN***
>> 27 May 2007 20:03:17 From [ANONYMOUS]@LISTSERV.SYR.EDU: X-LOGCK
>> 14BF5837AF8379B229 AUTHINFO(66.249.67.57) ORGINFO(66.249.67.57)
>>
>> 66.249.67.57 is registered to Google.
>>
>> Any ideas what is going on? Any techniques I can use to shut this off?
>I
>> tried adding a filter *@66.249.67.57, but that doesn't seem to stop
>it.
>>
>> Thanks,
>> Nelson
>>
>> -- Syracuse University Listserv List Manager
>> -- Listserv webpage: http://listserv.syr.edu
>>
>
|
|
|