LISTSERV - LSTOWN-L Archives - COMMUNITY.EMAILOGY.COM

A serious  problem exists when  a peered  list has a  mix of 1.8  and 1.7
peers,  and  the 1.8  peers  are  configured  to use  Internet  addresses
("List-Address=  FQDN", or  equivalent  default). SIGNOFF/DELETE  (GLOBAL
commands may cause  an infinite loop, which will furthermore  fill up the
A-disk  of all  the peers  involved (given  enough time).  Eventually the
servers  will crash  and  will  not be  able  to  restart without  manual
intervention.
 
**************************
* How to restore service *
**************************
 
If this happens to you, DO NOT DELETE PERMVARS FILE! Use XEDIT or PIPE to
eliminate all the  lines containing the string  X-REQ (ALL/X-REQ followed
by DEL * under XEDIT).  PERMVARS FILE contains important information, and
should be treated as a LIST or FILELIST file, not as LISTSERV NETLOG.
 
Once you have returned PERMVARS FILE to its normal size, you will be able
to start  LISTSERV, but the  problem will  remain until you  identify and
remove the looping jobs. They will  be in mail files called LISTSERV MAIL
and coming from the servers whose  hostnames were in the X-REQ entries in
PERMVARS FILE. The job  name will be the same as the name  of the list in
question. This may or may not  be sufficient to identify the files. There
is only one job  to kill per 1.7 peer (jobs from 1.8  peers don't need to
be killed).  If you  can't identify  the jobs,  or if  you are  not sure,
remove all the peer subscriptions  from the list in question temporarily.
This will stop the loop.
 
*************************
* Permanent restriction *
*************************
 
There are three ways to run a peered list with both 1.7 and 1.8 peers:
 
1. All 1.8 peers use "List-address= NJE", and all peer subscriptions (the
   subscriptions with  the name "Peer  distribution list") are  in BITNET
   form. This  produces the  behaviour the 1.7  servers expect  (the only
   behaviour they supported) and there is no problem. But, of course, the
   1.8 peers cannot  take advantage of the  "List-address=" support added
   in  1.8a  and  continue  to identify  themselves  under  their  BITNET
   address.
 
2. The 1.8 peers use "List-address= FQDN", and all peer subscriptions are
   in BITNET form. In that case there is no risk of loop, but subscribers
   will receive an error message about duplicate postings every time they
   post to the list.
 
3. The  1.8 peers  use "List-address= FQDN",  and the  corresponding peer
   subscriptions  are changed  to their  Internet form.  This solves  the
   problem mentioned above, but exposes the list to the loop.
 
Option 1 is fully supported. Option  2 is supported, but not recommended.
Option 3  is not supported. Note  that servers running the  base level of
1.8a may experience the symptoms described  in option 2. However, they do
not suffer  from the loop  problem. The message about  duplicate postings
does not necessarily mean the loop problem is present.
 
**********************
* Can't it be fixed? *
**********************
 
The problem cannot be  fixed by a change to version 1.8.  It is caused by
the  algorithm version  1.7 uses  to forward  certain requests.  When the
request comes back to  the 1.8a server, it cannot tell  whether it is due
to the 1.7 restriction,  or caused by a new command  from a user, because
the 1.7 server  assigns a new ID.  The only way to solve  this problem in
1.8a would be to disable command forwarding completely.
 
There are three general strategies to bypass this restriction:
 
1. Upgrade all participating servers to 1.8.
 
2. Remove all 1.7 peers.
 
3. Run the list with "List-address= NJE" until all peers migrate to 1.8.
 
L-Soft was not aware  of this problem until it hit  UBVM tonight. This is
the reason why it was not mentioned  in the release notes, and why we had
previously advised  people to select the  options that expose you  to the
loop.
 
As you know,  L-Soft offered to deliver 1.8a to  all the beneficiaries of
the CREN/L-Soft  contract who did not  manage to push the  paperwork past
their legal department, at L-Soft's own risk (without written agreement).
Because of the use of the word "executed" rather than "agreed" in section
4  of the  CREN/L-Soft agreement,  L-Soft cannot  do this  without CREN's
written  permission. Regretfully,  the last  formal response  we received
from CREN  on this topic  was on June  29. CREN did  not agree to  let us
deliver 1.8a  to everyone, although  no reason was stated.  Instead, CREN
proposed that the deadline be extended by another three months. We turned
down this offer  because it would not solve the  problem. The reason less
than a third  of the beneficiaries returned usable contracts  is that the
contracts  are too  complicated and  involve three  parties (against  our
recommendation  - we  wanted a  simple, standalone  maintenance agreement
with two parties), whereas purchase lawyers are not usually familiar with
three-party maintenance agreements  where 18 out of 30  pages are totally
out of their control and not  negotiable. Extending the deadline will not
solve  this problem.  Furthermore, we  do not  have three  months at  our
disposal. The backbone  must be made LTCP-exploitive by the  end of July,
and  L-Soft does  not believe  that  100 universities  will return  their
contract over the next two weeks when only about 50 did so since March.
 
What is most unfortunate with this  negotiation is that we have still not
been told  why CREN opposes our  proposal. Not knowing what  bothers CREN
with the  proposal, and  being faced  with an  unusable counter-proposal,
there is  not much we  can do  in terms of  negotiations. We now  have no
option but to begin retrofitting  LTCP exploitation into 1.7f. Volunteers
for beta-testing are invited to contact L-Soft privately.
 
  Eric