LSTSRV-L Archives

LISTSERV Site Administrators' Forum

LSTSRV-L

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Valdis Kletnieks <[log in to unmask]>
Thu, 5 Dec 2002 13:43:12 -0500
text/plain (100 lines)
On Thu, 05 Dec 2002 10:56:46 EST, Tim Parker <[log in to unmask]>  said:
> Is there any easy way to get an output of the top domains that are on my
> listserv lists? I am working on tweaking our LSMTP deliver and for the
> destinations I am not sure of the right %'s for the different domain names.
> Is there anything simple that can give the top ones?
>
> I know I can download the list and sort and use excel, but....

Sorting the list would be the totally wrong thing to do.

Almost certainly, what you care about is total traffic to domains.  So for
instance, if you have 3 lists that each have 2,500 recipients for FOO.COM,
but those three lists only see traffic once a week, it's not as important
to tune for that case(*) as if you have 20 lists that have 100 recipients
for BAR.COM which get 100 postings a day.

What you *probably* want to do is take a week's worth of Listserv logs,
take out all the 'Mail posted via SMTP to addr@host' lines, and do your
statistics on *that*, so you're looking at *traffic*, not at *subcribers*.

(*) Of course, if you have a very large list that is expected to deliver
very fast throughput, tune for that.  But you already know which lists those
are if you have any... ;)

I just chunked out the quick stats - first, let's run across all the lists and see what
non-VT subscribers we have.  So we get the admittedly ugly shell one-liner:

[/home/listserv/home]1 find . -name '*.list' | xargs listview -s | \
        egrep -i -v 'Ø\*|vt.edu|Ø$|ØFile ' |sed 's/Ø.*@\([Ø ]*\).*/\1/' | \
        rev | cut -f1-2 -d. | sort | rev | uniq -c

(The 'rev | cut | sort | rev' paradigm is quite useful sometimes).  So we find we
have some 15,631 different second-level domains represented, and the top ones are:

11684 AOL.COM
5415 HOTMAIL.COM
3490 YAHOO.COM
3283 VA.US
2296 EROLS.COM
1591 JUNO.COM
1128 MSN.COM
 962 ATT.NET
 925 MINDSPRING.COM
 845 EARTHLINK.NET
 829 RADFORD.EDU
 528 VIRGINIA.EDU
 519 COMPUSERVE.COM
 503 INFI.NET
 475 NAVY.MIL
 385 PSU.EDU
 349 RR.COM
 332 NASA.GOV
 328 NCSU.EDU
 320 SWVA.NET
 310 AC.UK
 306 PRODIGY.NET
 303 RUNET.EDU

Now let's go look at a week's traffic...

[/var/logs]1 egrep 'relay=.*stat=Sent' maillog* | grep -v 198.82.161.196 | \
        sed 's/Ø.*relay=\([Ø ]*\).*/\1/' | rev |  cut -f2-3 -d'.' | sort | rev | uniq -c

Only get 4,113 second-levels in a week, and the top ones are:

9175 american.edu
7014 hotmail.com
5616 yahoo.com
5273 psmtp.com
4469 earthlink.net
3359 lsoft.com
2699 rr.com
2340 va.us
2063 sas.com
1804 aol.com
1785 msn.com
1623 prodigy.net
1556 army.mil
1279 washburn.edu
1278 serena.com
1234 nodak.edu
1165 edu.au
1078 msu.edu
1063 frb.org
1058 com.cn
1021 mindspring.com
1004 adelphia.net
1001 outblaze.com
 997 criticalpath.net

The american.edu, lsoft.com entries are due to the DIST2 network jobs...  Notice that
the two lists are *not* that similar....

(The wonders of shell one-liners.. ;)

--
                                Valdis Kletnieks
                                Computer Systems Senior Engineer
                                Virginia Tech

ATOM RSS1 RSS2