Once upon a time, I had to do something similar. I moved the logs to our
research computing machine and wrote a multithreaded perl script to do
the scanning and analysis. Using 6 of the 8 processors, the job took ~4
days to run, and that was with only about 10 months of logs. Hope your
site doesn't see too much volume, and with 2 years of logs to search, I
hope you have some cpu cycles to spare
Here's the script, ugly and inefficient though it is :-\ :
-------------------------
#!/usr/bin/perl
use POSIX;
sub REAPER {
$SIG{CHLD} = \&REAPER;
my $pid = wait;
print "Process finished: $pid\n";
$done++;
}
$SIG{CHLD} = \&REAPER;
open(LISTDATA, "<./list_data");
@lists = <LISTDATA>;
close LISTDATA;
foreach $line (@lists) {
($list, $owner) = split /:/, $line;
$owners{"$list"} = $owner;
}
$a = 1;
print "Going to look for logfiles...";
opendir(LSV, "./split");
while ($file = readdir LSV) {
if ($file =~ /^listserv.log/) {
push @files, $file;
}
}
print "Found ", scalar(@files), " logfiles.\n";
closedir(LSV);
while (scalar(@files) > 0) {
push @logs1, shift @files;
push @logs2, shift @files;
push @logs3, shift @files;
push @logs4, shift @files;
push @logs5, shift @files;
push @logs6, shift @files;
}
$done=0;
for ($a=1;$a<7;$a++) {
makechild($a);
}
while ($done < 6) {
sleep;
}
sub makechild {
($mynum) = @_;
my $pid;
my $sigset;
$sigset = POSIX::SigSet->new(SIGINT);
sigprocmask(SIG_UNBLOCK, $sigset);
die "fork: $!" unless defined ($pid = fork);
if ($pid) {
#Parent Code
sigprocmask(SIG_UNBLOCK, $sigset);
print "Started child $mynum: pid: $pid\n";
return;
} else {
#Child Code
sigprocmask(SIG_UNBLOCK, $sigset);
open(OUTFILE, ">./good$mynum");
$mylogs = "logs". $mynum;
@lists = sort(keys(%owners));
print OUTFILE "Searching for activity on ", scalar(@lists), "
lists.\n";
print OUTFILE "Searching ", scalar(@{$mylogs}), " logfiles.\n";
foreach $fileb (@{$mylogs}) {
print OUTFILE "$fileb\n";
}
foreach $filea (@{$mylogs}) {
print OUTFILE "Opening File: $filea\n";
open(DATA, "<./split/$filea");
@curlog = <DATA>;
close DATA;
print OUTFILE "Found ", scalar(@curlog), " lines of data\n";
#@lists = keys(%owners);
#print OUTFILE "Searching for activity on ",
scalar(@lists), " lists.\n";
foreach $list (keys %owners) {
$listuc = uc($list);
$search = qq<Distributing mail ("$listuc")>;
$searcha = quotemeta $search;
foreach $linea (@curlog) {
if ($linea =~ /($searcha)/) {
print OUTFILE "$list\n";
}
}
}
@curlog = ();
print OUTFILE "Done with file $filea\n";
}
close OUTFILE;
exit;
}
}
----------------------------------
Anne Toal wrote:
> They all require the presence of a system_changelog file which at
> present does not exist on this server. That's changing tonight at
> midnight though :-)
>
> I think Paul's suggestion is a good one. Luckily for me I have access to
> the services of a programmer who can script scanning tons of list
> changelogs for me.
>
> Merry Christmas, Listserv people.
> -aht
>
> Patrick B. O'Brien wrote:
>> What about,
>> Server Management;
>> Server Accounting;
>> Usage Statistics.
>>
>>
>>
>> -----Original Message-----
>> From: LISTSERV site administrators' forum
>> [mailto:[log in to unmask]] On Behalf Of Paul Russell
>> Sent: Sunday, December 24, 2006 5:20 PM
>> To: [log in to unmask]
>> Subject: Re: Server-wide usage stats
>>
>> On 12/24/2006 19:08, Anne Toal wrote:
>>
>>> I need to run a report that will show me which lists have not have had
>>>
>>
>>
>>> any posts in the past two years. Can someone please tell me how to do
>>>
>> that?
>>
>> There is no built-in function to provide this type of report. There are
>> probably
>> several ways to attack this problem; here are two suggestions.
>>
>> 1. If most lists are configured to maintain changelogs, you may be able
>> to
>> obtain the information you need from the changelog files for the
>> individual
>> lists. You will need to retrieve a copy of the changelog file for
>> each list,
>> store all of the changelog files in one location, and write a script
>> to
>> extract the required information from the changelog files.
>>
>> 2. If most lists have archives, write a script to walk the list archives
>> tree
>> and return the modification date for the most recent log file for
>> each list.
>>
>> Both suggestions rely on configuration settings which may not be in use
>> on some
>> lists, so neither is guaranteed to produce a complete report.
>>
>>
--
Christopher Wilson
Information Systems Coordinator
ISS Enterprise Systems
The George Washington University
[log in to unmask]
|