LSTSRV-L Archives

LISTSERV Site Administrators' Forum

LSTSRV-L

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Nathan Brindle <[log in to unmask]>
Tue, 29 May 2007 14:06:32 -0400
text/plain (74 lines)
See http://www.google.com/support/webmasters/bin/answer.py?answer=40364 

That looks like it ought to work, though.  I'm guessing it will take a while for the information to disseminate through their server farm and for the hits to stop.  In the meantime, if the bots are doing what they're supposed to be doing, they're should be just hitting robots.txt and stopping without trying to index any farther.

You might also want to use their URL removal request tool -- see http://www.google.com/support/webmasters/bin/answer.py?answer=61062 .

Nathan

At 01:49 PM 5/29/2007 -0400, [log in to unmask] wrote:
>I very much appreciate all of the help I've been receiving. We are trying
>the robots.txt option, but it doesn't seem to be working. Here's what
>we've tried- perhaps we're doing it wrong. We put the following in
>inetsrv.wwwroot
>
>User-agent: googlebot
>Disallow: /
>
>Questions:
> am I doing the right thing?
> Will this stop the messages in the Listserv logs, or will they continue
> to show?
>
>Thanks again for any and all help.
>
>Nelson
>
>On Tue, 29 May 2007, Andy Smith-Petersen wrote:
>
>>Googlebot was over-crawling us recently - hundreds of thousands of hits
>>daily, despite a relatively small number of public archives on our site.
>>>From looking at the web server logs, it was clear that they were stuck
>>in a some kind of loop, indexing some posts between two and six times a
>>day. I did get Google support to decrease the crawl rate a bit, but they
>>did not fix the loop. (My suspicion is that the large number of URL
>>parameters for plaintext vs html, fixed width vs variable fonts, etc was
>>confusing the issue, but I never did nail that down.)
>>
>>So I added a bunch of entries to our robots.txt file to disallow
>>crawling of any archives > 1 month old.
>>
>>--
>>Andy Smith-Petersen
>>System Administrator
>>IT Network Services
>>University of Southern Maine
>>
>>
>>On Sun, 2007-05-27 at 19:22 -0500, Andrew Bosch wrote:
>>> It's probably a Googlebot invoking the WA CGI. You will have better
>>>  success blocking access at your web server or firewall.
>>>
>>>
>>> >>> <[log in to unmask]> 5/27/2007 7:12 PM >>>
>>> Since at least midnight today , every few seconds we're seeing
>>>
>>> 27 May 2007 20:03:16 To   [ANONYMOUS]@LISTSERV.SYR.EDU: ***LOGIN***
>>> 27 May 2007 20:03:17 From [ANONYMOUS]@LISTSERV.SYR.EDU: X-LOGCK
>>> 14BF5837AF8379B229 AUTHINFO(66.249.67.57) ORGINFO(66.249.67.57)
>>>
>>> 66.249.67.57 is registered to Google.
>>>
>>> Any ideas what is going on? Any techniques I can use to shut this off?
>>I
>>> tried adding a filter *@66.249.67.57, but that doesn't seem to stop
>>it.
>>>
>>> Thanks,
>>> Nelson
>>>
>>> -- Syracuse University Listserv List Manager
>>> -- Listserv webpage: http://listserv.syr.edu
>>>
>>

ATOM RSS1 RSS2