LSTSRV-L Archives

LISTSERV Site Administrators' Forum

LSTSRV-L

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Valdis Kletnieks <[log in to unmask]>
Wed, 13 Nov 2002 01:37:37 -0500
text/plain (68 lines)
On Tue, 12 Nov 2002 18:53:56 EST, Bobby Kuo <[log in to unmask]>  said:

> I first noticed a problem when I received an error report on 11/9/2002
> from Listserv stating that all of the addresses on one of the lists had
> transient non-fatal errors:

Connection refused by hotmail.com and yahoo.com.  *yawn* ;)

But seriously - right now it's midnight, low activity, and my Listserv
machine has 1,039 pieces of queued outbound mail - all destined for sites
that didn't answer on the first try.  Last week's Hotmail flakiness resulted
in about 120 *meg* of syslog messages here.

On the other hand, you seem to have stuff that's destined to lsoft.com
that's been sitting around for 5 days.

The most suspicious thing I see here is this:

| Nov  7 16:45:35 relay=davenport0 [192.168.0.200]
| Nov  7 16:45:35 relay=vm.se.lsoft.com. [24.147.1.10]

So your address is in the 192.168/16 NAT space from RFC1918, and the destination
is in a public address space.  This certainly sounds like your NAT gateway
has gotten good and seriously hosed up, and is refusing to NAT certain
addresses correctly.  The quick counter-check for this is to just run
'telnet 24.147.1.10 25' and see if you get an SMTP banner back.  If this works,
then your NAT and network are in OK shape.

On the other hand, checking my last week's worth of logs for vm.se.lsoft.com
show the daily monitoring mail to not have succeeded on the first attempt
since at least Nov 5. (And now I'm glad that I dug further into this,
as I've just discovered a flag in sendmail.cf that's so sub-optimal that
it borders on pessimal).  However, mine succeed on the first or sometimes
second retry.  If telnet to their port 25 works, and your mail isn't moving,
you have other issues.

The fact that you're getting syslog messages indicates that the queue *IS*
being run on a regular interval, so I will skip any debugging of that class
of problems.

If you are using a 'mailstats' directory, you may wish to use the command
'hoststat' to see if the "last status" seems reasonable (note the time
as well), and/or run "purgestat" to clear out old entries.  Also,
look in your sendmail.cf for a line that looks like:

O Timeout.hoststatus=30m

(the default is 30 minutes - if you've got this one set to many hours/days,
you'll have some rather bad karma - this variable controls (basically) how
long to remember that a host was down last time you tried, so don't even
bother trying again for at least this long.  Delivery attempts that are
bypassed because of this can be identified because the message will
read:

.... relay=whatever.com., dsn=4. ,etc etc

Actual delivery attempts will look like (note the IP is listed this time):

.... relay=whatever.com. [127.0.0.2], dsn=4. etc etc et

Hopefully something in the above points you in the right direction.  If
nothing seems to match, give a yell and I'll ponder it some more..

--
                                Valdis Kletnieks
                                Computer Systems Senior Engineer
                                Virginia Tech

ATOM RSS1 RSS2