On Mon, 24 Sep 2007 15:09:50 EDT, "Durbin, Daniel R" said:
> Right now we have 150,326 messages to 574 destinations. If I understand you
> correctly, this appears to the MTA more like 574 queues, rather than 150,000
> and the retry load would not be as intense as I imagined.
Right, because the MTA can optimize it and say "Well, I have 917 messages for
this site, let's try connecting.. Nope, got a timeout, not worth trying the
*other* 916 for at least 10 or 20 minutes", and go on with the other 573
destinations. Conversely, if the connection succeeds, it should try to shovel
all 917 across the connection, one after the other.
What really hurts is when you get connections to a site that are slow or
rate-limited, or throw lots of 4xx errors you need to retry. Getting the
infamous '421 too many recipients' after the 50th when you have 200 or 300 for
the site is another good headache inducer.
I'll bet that of those 574 destinations, 15 or 20 are a large percentage of
the 150K, and the tail gets seriously long pretty quickly. Have you done any
statistical analysis of which sites you tend to be queued up for, and why
(greylisting, network burps, etc)? That often tells you a lot about what
you need to be thinking about.
Conversely, playing "What do the 400 small sites all have in common?" can
be useful too - and make sure to look at the actual MX destinations, not just
the domain name. The traffic patterns I was seeing at one point didn't make
any sense, until I realized that about 20% of the sites I was sending to were
all routed to Postini's outsourcing service (so it really *did* behave like
one site with 500+ names, not 500 sites that all went up and down at the same
time)....
(And if you've been paying attention, I've said about zero that's MTA-specific,
and that's because no matter *what* MTA you use, you end up with the same
questions and issues....)
|