LSTSRV-L Archives

LISTSERV Site Administrators' Forum

LSTSRV-L

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Tue, 29 May 2007 13:49:48 -0400
TEXT/PLAIN (65 lines)
I very much appreciate all of the help I've been receiving. We are trying
the robots.txt option, but it doesn't seem to be working. Here's what
we've tried- perhaps we're doing it wrong. We put the following in
inetsrv.wwwroot

User-agent: googlebot
Disallow: /

Questions:
 am I doing the right thing?
 Will this stop the messages in the Listserv logs, or will they continue
 to show?

Thanks again for any and all help.

Nelson

On Tue, 29 May 2007, Andy Smith-Petersen wrote:

>Googlebot was over-crawling us recently - hundreds of thousands of hits
>daily, despite a relatively small number of public archives on our site.
>>From looking at the web server logs, it was clear that they were stuck
>in a some kind of loop, indexing some posts between two and six times a
>day. I did get Google support to decrease the crawl rate a bit, but they
>did not fix the loop. (My suspicion is that the large number of URL
>parameters for plaintext vs html, fixed width vs variable fonts, etc was
>confusing the issue, but I never did nail that down.)
>
>So I added a bunch of entries to our robots.txt file to disallow
>crawling of any archives > 1 month old.
>
>--
>Andy Smith-Petersen
>System Administrator
>IT Network Services
>University of Southern Maine
>
>
>On Sun, 2007-05-27 at 19:22 -0500, Andrew Bosch wrote:
>> It's probably a Googlebot invoking the WA CGI. You will have better
>>  success blocking access at your web server or firewall.
>>
>>
>> >>> <[log in to unmask]> 5/27/2007 7:12 PM >>>
>> Since at least midnight today , every few seconds we're seeing
>>
>> 27 May 2007 20:03:16 To   [ANONYMOUS]@LISTSERV.SYR.EDU: ***LOGIN***
>> 27 May 2007 20:03:17 From [ANONYMOUS]@LISTSERV.SYR.EDU: X-LOGCK
>> 14BF5837AF8379B229 AUTHINFO(66.249.67.57) ORGINFO(66.249.67.57)
>>
>> 66.249.67.57 is registered to Google.
>>
>> Any ideas what is going on? Any techniques I can use to shut this off?
>I
>> tried adding a filter *@66.249.67.57, but that doesn't seem to stop
>it.
>>
>> Thanks,
>> Nelson
>>
>> -- Syracuse University Listserv List Manager
>> -- Listserv webpage: http://listserv.syr.edu
>>
>

ATOM RSS1 RSS2