Eric Thomas <ERIC@FRECP11>
Sat, 21 Mar 1987 13:06 SET
|
LISTSERV@CEARN is looping and probably eating hundreds of minutes of CPU,
either because it has entered the LSVBITFD loop or because it has received an
invalid Netdata file from DMZRZU71 like mine. I have contacted the
CONSOLE@CERNVM crew but they seem unable to either permanently stop the
server, hold all its reader files, or re-autolog it (they even asked me what
their LISTSERV's logon password was... I didn't know of course so they said
they couldn't restart it, but they ended up doing it anyway). I'm afraid
nothing can prevent LISTSERV@CEARN from eating a few DAYS of CPU this weekend.
:-(
Anyway, I had sent a set of fixes for release 1.5i (FIX15I SHIPMENT) which
seems not to have been distributed. I think it's still in LISTSERV@CEARN's
reader and has not been lost so I won't be resending it unless Olivier
confirms that the shipment has been lost.
I am seriously thinking about something: it seems that regardless how
serious a computing center is, you will unavoidably find yourself confronted
to a novice operator on the very precise day that you need a competent one.
The only thing that you can ask an operator is to issue a CP FORCE command --
it's something he will understand. He won't be able to do an AUTOLOG because
he won't know the password. He won't be able to hold reader files because he
won't know the command. There are basically two cases when a server enters a
CPU loop:
1) It enters the loop at initialization. This is very unlikely, especially as
it would probably happen while you (the LISTSERV owner) are logged on too.
2) It enters a loop while processing some command/request from its reader.
In the first case nothing can be done by an operator. In the second case I
think there is a solution: as soon as a reader file has been read in, LISTSERV
changes it to HOLD status (this file is supposed to be PURGEd upon completion
of the command, or TRANSFERred to the postmaster if there was a problem). If
it subsequently enters a loop, you can ask the operator to CP FORCE it --
that's something they always manage to do without problem. If you then have
some kind of server that checks all servers from time to time and AUTOLOGs
them if they're logged off, or if your operators can AUTOLOG without password,
the server will eventually restart and will not process the lethal reader file
again. :-) I think it would work so I'm going to put it in the next release.
Apart from that, LSVXMAIL has been fixed (Mail-Via I mean). Valdis has a
copy of the fixed exec, but it is NOT included in FIX15I SHIPMENT (ie I solved
the problem after sending FIX15I).
Eric
|
|
|