LSTSRV-L Archives

LISTSERV Site Administrators' Forum

LSTSRV-L

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
"William H. Magill" <[log in to unmask]>
Fri, 11 Feb 2000 09:46:03 -0500
text/plain (83 lines)
>   (1)  Overall volume and the trends in that.

The "mailstats" command/program that ships with sendmail gives you this.
Of course it isn't documented for squat. (I'd love to know what is
REALLY being recorded as "msgsrej" "msgsdis" information, but haven't found
it in the docs yet -- "Number of messages rejected" is not very informative.)

>   (2)  Delivery time for the vast majority of messages, and the trends in
>   that.  Distinguishing between internal and external traffic sounds useful.
>
I "think" this is pretty easily counted, and about the only thing the perl
scripts do reliably against the syslog file.

>   If we were to see delivery time trending up, that might be something we
>   should look into before it gets really bad and people start complaining.

I'm not convinced that a method exists to determine "delivery time" without
some sort of heavy "record matching." If your sever is busy, you won't know
WHICH incoming message triggered WHICH outgoing message. The information is
probably in the log file, but nobody has bothered to tackle the problem of
integrating the LISTSERV log with the Sendmail log, let alone just matching
the records in the sendmail log. And nobody that I know has ever done
anything with the mailq -- there is no information logged as such, one has
to really hunt to find it. I'm not saying that it can't be done, just that
I don't think that anyone has ever spent the time and effort to track it.
(Unless you are AOL or Hotmail, and they problaby won't give you the
software they developed.)

>   Things like overall volume can be useful to present to upper management
>   (to justify a bigger machine, for example).
>
In your case this is probably legitimate -- ie all of the traffic is
germane. However, the volume issue is what took Bell Labs Indian Hill out
of the uucp "well known relay" business. When the went to management with
the upgrade statistics, it turned out that 90% of the traffic they were
seeing was uucp traffic destined for some non-Bell site, but which was
simply being relayed through them. They didn't get the upgrade and were
forced to disconnect from uucp.

>   Could you further explain what you mean by problems introduced by the
>   granularity of the logging?

Probably a bad choice of words.

The instrumentation which we have to deal with is very primitive, and NOT
intended to generate the kind of reports that we operational types are
looking for.
Remember, SMTP is based on the assumption that it can take DAYS for a
message to be delivered. And in fact, the logging which takes place is
intended for debugging purposes, not accounting. That's why the sendmail.st
file exists (and talk about "less filling" tools).

You mentioned short intervals. While the actual time stamps on the
contents of the log files imply second by second activity, you would have
to determine if that is true under load. I don't know the answer -- for
example if a message is sent to 100 recipients, what does the time stamp
look like for all 100 recipients. Theoretically, one would assume that all
100 would have the same time stamp (ie the time the message is queued) and
then a time  stamp or delay factor indicating when the message was actually
sent. However, I do not believe that is what shows up in the log.
I think that each shows up as a descrete message sent "instantly."
These are also questions related to the effects of peering and TOBUFSIZE
type tuning.

What's completely missing is any kind of overall "system" to cycle the logs
and generate periodic reports, etc.

I started playing with the old perl scripts I mentioned yesterday. And
discovered that not only did they make some naive assumptions, but they
also had to be re-worked for both the new post 5 Perl, but also the new
sendmail log formats (or else they never worked correctly before.)
One of the fun things that happens when you work with a known test data set
and expect to get certain results.

--
                ===<Tru64 UNIX-SIG Chair>===
                     www.tru64unix.org
T.T.F.N.
William H. Magill                          Senior Systems Administrator
Information Services and Computing (ISC)   University of Pennsylvania
Internet: [log in to unmask]             [log in to unmask]
                                           http://pobox.upenn.edu/~magill/

ATOM RSS1 RSS2